Cumberland University Difference Between Data Analytics & Data Mining Discussion There is much discussion regarding Data Analytics and Data Mining. Sometimes these terms are used synonymously but there is a difference. What is the difference between Data Analytics vs Data Mining? Please provide an example of how each is used.Minimum of 2 -3 pagesProvide extensive additional information on the topicExplain, define, or analyze the topic in detailShare an applicable personal experienceProvide an outside source (Example:a scholarly article from the UC Library) that applies to the topic, along with additional information about the topic or the source (please cite properly in APA) Learning Analytics or Educational Data Mining? This is the Question…
Daniela Marcu
Ștefan cel Mare University of Suceava
Str. Universității 13, Suceava 720229
Phone: 0230 216 147
mdaniela.marcu@yahoo.ro
Mirela Danubianu
Ștefan cel Mare University of Suceava
Str. Universității 13, Suceava 720229
Phone: 0230 216 147
mdanub@eed.usv.ro
Abstract
In full expansion, a vital area such as education could not remain indifferent to the use of
information and communication technology. Over the past two decades we have witnessed the
emergence and development of e-learning systems, the proliferation of MOOCs, and generally the
rise of Technology Enhanced Education. All of these contributed to generation and storage of
unprecedented volumes of data concerning all areas of learning.
At the same time, domains such as data mining and big data analytics have emerged and
developed. Their applications in education have spawned new areas of research such as educational
data mining or learning analytics.
As an interdisciplinary research area Educational Data Mining (EDM) aims to explore data
from educational environment to build models based on which students’ behavior and results are
better understood. In fact, EDM is a complex process that consists of a few steps grouped in three
stages: data preprocessing, modelling and postprocessing. It transforms raw data from educational
environments in useful information that could influence in a positive way the educational process.
According to Society for Learning Analytics Research (SoLAR) which took over the
wording of the first International Conference on Learning Analytics and Knowledge, learning
analytics is ”the measurement, collection, analysis and reporting of data about learners and their
contexts for purposes of understanding and optimizing learning and the environments in which it
occurs” (Siemens, 2011).
This paper proposes a comparative study of the two concepts: EDM and learning analytics.
Due to certain voices in the scientific environment that claim that the two terms refer to the
same thing, we want to emphasize the similarities and differences between them, and how each one
can serve to raise the quality in educational processes.
Keywords : EDM; LA; Data Mining; Education.
1. Introduction
The educational community has an interest in the great potential of education. Why are
researchers so enthusiastic about this? The answer is simple. Seeing the impact of applying data
mining to exploiting large data volumes and analyzing data from areas such as the business
environment, social media, and other scientific areas, we can think of the benefits for the education
system. If we could adapt the methods of finding models in the data, used for analyzing the online
activity of clients and social media users for the educational environment, we could get closer
evidence of reality on the activities of the training system.
The widespread use of computer-based pre-university learning, the development of Webbased courses, are additional reasons for EDM and LA research.
Designing educational policies based on practical evidence provided by researchers can
bring benefits to the educational system.
1
BRAIN – Broad Research in Artificial Intelligence and Neuroscience
Volume 10, Special Issue 2 (October, 2019), ISSN 2067-3957
The exploitation of large volumes of data from different domains is done using specific
techniques and methods. It helps to develop tools to facilitate progress in these areas.
The science of extracting useful information from large volumes of data is called Data
Mining (DM) (Hand, Mannila & Smyth, 2001).
The concept is based on three key areas: statistics, artificial intelligence and machine
learning (Figure 1).
Figure 1. Data Mining
Initially, DM used statistical algorithms. Specific techniques such as decision trees,
association rules, clustering, artificial neural networks, and others have been developed (Șușnea,
2012).
Applying exploitation methods for educational system data to build models to better
understand students’ behavior and outcomes is named Educational Data Mining (EDM). Since data
and education issues are different from those in other areas, classical DM methods have been
improved and supplemented with EDM specific methods (Romero & Ventura, 2007). According to
some authors, there are four areas of application of EDM aimed at: improving student modeling and
domain modeling, e-learning and scientific research (Baker, 2012).
In order to better understand learning, data from pupils and from the educational
environment is measured, collected and analyzed. This is the learning analysis and is a related field
of EDM. Among the Learning Analytics (LA) methods we can list:
content analysis
discourse analysis
analyzing the social dimension of learning (Ferguson & Buckingham Shum, 2012).
In the following sections we propose to detail relevant aspects about EDM and LA in order
to provide viable arguments in a comparative study of the two concepts.
2. Educational Data Mining
Over the past 10 years, the field of research aimed to exploit the unique types of data from
education has developed quite internationally. In 2011, in Massachusetts USA, the International
EDM Working Group (established in 2007) created the International Society for EDM (online:
http://educationaldatamining.org/about/). Romania is, however, at a pioneering stage in EDM.
There is currently a growing interest in using computers in learning and Web-based training. With
the rapid increase in the volume of learning software resources, the Romanian educational system
also accumulates huge amounts of data from students, teachers, parents, libraries, secretariats, etc.
Getting the information needed to build models to improve the quality of managerial decisions
becomes one of the greatest challenges of the present.
Traditional research in the field of education is time-consuming and often non-ecological
through the waste of material resources. Developing an experimental study, such as combating
school absenteeism, involves firstly the selection of schools, teachers and pupils. It follows the
definition of strategies that lead to the identification of sources of school stress, increasing the
2
D. Marcu, M. Danubianu – Learning Analytics or Educational Data Mining? This is the Question…
motivation of students to attend classes, trust in school, family, and so on. However, the studies
depend on context, class, geography, economic development, teacher-student relationships.
Changing any parameter can lead to very different conclusions. Soon there may be new factors that
could not be taken into consideration earlier in the demotivation of students towards school. Making
traditional new studies for this topic involves the use of important temporal resources.
By comparison, EDM proves to be more efficient. The analysis of existing data in the
educational system through the use of specific EDM methods allows the identification of new
models for new contexts. An enormous advantage is that the same methods can be applied to
different data generating specific results without the need for new analysis strategies.
More specifically, let’s take the example of a course designed for web-based training
(Romero, Ventura, De Bra, 2004). Traditionally, evaluating the effectiveness of a course is done by
analyzing the results obtained by the student upon completion of the course, which does not
necessarily lead to the improvement of the material or methods and teaching tools used for the
future course versions. In fact, in the Romanian pre-university system, the updating of educational
programs and educational resources does not present the periodicity expected by the society.
What would it be like the knowledge of EDM data exploitation? EDM methods aim at
discovering correlation rules between course components (content, questions, various activities) and
student activities. In the Knowledge Discovery with Genetic Programming for providing feedback
to the courseware author, C. Romero, S. Ventura and P. Bra describe the four main steps in
building a software based on EDM (Romero, Ventura, De Bra, 2004): development, use,
discovering knowledge, improving
Other classification has three stages: preprocessing, data exploitation and post processing
[3]. The cycle of these steps is illustrated in Figure 2.
Figure 2. Stages of the process of converting data into information
If we refer again to the analysis of the efficiency of a course, in the first stage, the
preprocessing is performed various operations such as:
the teacher creates the content and provides information on pedagogical and methodological
aspects
the teacher creates course support
the student uses the course
the EDM software records information about: the student’s time spent in the course, the
sections visited, the scores obtained and other interactions
the information collected is converted into data with a format appropriate for processing.
In the next step, EDM-specific algorithms are applied to obtain different correlation rules.
The models will provide information in different formats for analysis: numerical results of the
coefficients, tables, diagrams, correlation matrices (an example is illustrated in Appendix 1 Correlation matrix obtained with the DataLab application based on the results of the Olympiad of
computer science).
One of the most important rules for discovering knowledge is if-else. Several such rules can
be defined in EDM: Association, Classification and Prediction (Klosgen & Zytkow, 2002).
3
BRAIN – Broad Research in Artificial Intelligence and Neuroscience
Volume 10, Special Issue 2 (October, 2019), ISSN 2067-3957
The teacher will analyze the results of the analyzes and study the degree of achievement of
the initial goals.
Depending on the conclusions, it may take the decision to improve the course and resume its
evaluation process. This may prove to be a difficult process because opinions can differ
significantly from one teacher to another in relation to the material and the way of interaction with
the student the course offers.
3. Methods of data exploitation
There are currently a wide variety of methods of exploiting data in the education system.
These can be categorized into two broad categories according to the ways to achieve the objectives:
predictive: Prediction, Classification, Regression, Outlier Detecting
descriptive: Clustering, Determination of association rules, Discovery of data for human
judgment (Sasu, 2014).
Many of these are general DM methods: prediction, classification, grouping, exploitation of
texts and others. But there are also specific EDM methods such as nonnegative matrix factorization
and Knowledge tracing (KT) (Romero & Ventura, 2012). Here are some of these:
Prediction
The method can be used in education to predict students’ behavior and outcomes. It is based
on the creation of predictive models. In the training phase, they learn to make predictions about a
set of variables called predictors by analyzing them in combination with other variables. Once the
enrollment phase is completed, the patterns can be applied to the data sets for which the prediction
is to be applied. It is known the study by Baker, Gowda, Corbett – Automatically detecting the
student’s preparation for future learning: help use is key (Baker, Gowda & Corbett, 2011). The
authors create a tool for automatically predicting a student’s future performance on the basis of
establishing positive or negative correlations between various features such as: student test results,
time spent in response, time elapsed between receiving a clue and typing the answer, and others. It
is experienced on a group of students, and then applied to another group. The results are then
compared to those obtained using the Bayesian Knowledge Tracing (BKT) model.
Classification
The method involves building a predictive model. The data in the training set is
characterized by certain attributes. The model must identify belonging to a class based on the set of
attributes. Suppose we built an educational software as an interactive game for a given theme.
Based on user attributes such as age, gender, geographic area, duration until the game is completed,
number of attempts we can build a classifier, and determine the user’s belonging to a specific class.
The model will learn to identify students. The analyzes can provide information on the need to use
this educational method for certain age groups, interests and education.
Methods that use the classification are: decision trees, neural networks, bayesian
classifications, and others.
Clustering
The method involves building patterns that identify data clustering after certain similarities.
For the model to provide quality predictions, the similarities inside class must be maximized and
similarities between classes minimized.
The use of this method in Romanian high school education could aim at grouping pupils
according to the pupil’s learning style (auditory, visual, practical – kinesthesis) based on the analysis
of behavior in relation to certain educational products and pupils’ characteristics. The prediction of
such a model could lead to an effective recommendation of how to learn educational content. Thus,
the instructional process could be carried out efficiently in relation to the learning particularities of
each student. At present, there is an attempt to unfold the lessons in a way appropriate to the
4
D. Marcu, M. Danubianu – Learning Analytics or Educational Data Mining? This is the Question…
students’ learning styles, but the reality is that identifying learning styles is superficial. The results
of the questionnaires are attached to the class catalog, but this does not lead, in most cases, to the
improve teaching methods and techniques used in the lesson. In the absence of clear alternatives,
the teacher has to improvise.
The method is successfully used in the detection of plagiarism (Text Mining) and is also
applied in the educational sphere.
Outlier Detection
The method involves creating patterns that detect data that have different features than
others. In Romanian education, this method could be used to detect students with content
assimilation problems, or those with aberrant behavior.
In general, not only one EDM method is used in case studies. Outlier Detection methods can
be used, for example, with data clustering techniques and decision tree classification as presented in
the study by Ajith, Sai and Tejaswi (2013) – Evaluation of student performance: an outlier detection
perspective (Ajith, Sai & Tejaswi, 2013). The study aims to identify learners with special learning
needs to reduce the school failure rate. Input data are collected from: participation in student
lessons, tests, notes on initial tests. In order to achieve the proposed objective, they try to find
models for classifying students who will be helpful in setting up study groups.
At present, in Romania, students in the high school education of state do not have the
opportunity to trace the course matter in other groups than the classes they belong to. Moreover,
pupils diagnosed as having special educational needs participate in classes with other colleagues.
The teachers create for them specially programs. Then the courses are held by under the guidance of
a single teacher who does not have any pedagogical and methodical experience related to the
learning situation! There are special requirements for conducting the educational process. This
based on grouping students within the same educational space within the same timeframe to go
through different course materials. In the absence of a proper classification, alternative methods and
means, and teachers with such experience, things happen more or less in a manner that leads to the
best results.
Discovery with Models
Discovery with Models is the fifth category presented in Baker’s Taxonomy (Baker, 2012).
It is also one of the most widely used methods of data exploitation in the field of education. It is
based on the use of a previously validated model as a component in analyzes that use prediction or
exploitation of relationships in new contexts (Baker & Yacef, 2009). In this way information on
educational materials that contribute most to educational progress can be obtained. A study carried
out by Beck and Mostow in 2008 – How who should practice: Using learning decomposition to
evaluate the efficacy of different types of practice for different types of students (Beck & Mostow,
2008) – on the analysis of different types of learners demonstrates that the method supports
identifying relationships between student behavior and characteristics of variables used.
Nonnegative Matrix Factorization (or Decomposition)
There are several algorithms used for factoring the nonnegative matrix. This transforms
(decomposes, factorizes) a matrix V into two W and H matrices with the property that they all have
non-negative elements. This is very useful in applications such as determining the effectiveness of
an evaluation system in which matrices contain elements related to: exams, abilities, and items.
Matrix V is obtained from the product of the two smaller matrices as can be seen in Figure 3.
(“Non-negative matrix factorization”, 2019).
Figure 3. Illustration of approximate non-negative matrix factorization. Source: wikipedia.org
5
BRAIN – Broad Research in Artificial Intelligence and Neuroscience
Volume 10, Special Issue 2 (October, 2019), ISSN 2067-3957
X
H
students
1 1 1 0 1 1
0 0 1 1 0 0
≈
items
W
skills
0 1
1 0
1 0
1 1
skills
Items
We propose to study the evaluation of two specific abilities defined on the columns of the
matrix W for 4 work requirements (items), defined in the W matrix on the four lines.
Matrix H will contain two lines representing the two abilities and 6 columns representing the
assessed students.
The result will be recorded in Matrix V that has 4 lines for each of the 4 items and 6
columns for each of the 6 students.
A value of 1 in the W matrix indicates the need for a certain skill (Figure 4) (Desmarais,
2012).
0
1
1
1
0
1
1
1
V
students
1 1
1 0
1 0
2 1
0
1
1
1
0
1
1
1
Figure 4. Non-negative matrix factorization – example
The first item requires the ability 2, W [1] [2] = 1. Only the 2 and 3 students have the ability
2, so item 1 will not be promoted by students 1, 2, 4 and 5.
To promote Item 4 both skills are required. Only one of the candidates will promote this
item with the maximum score.
Using computerized analysis methods, interpretations can be obtained in a much shorter
time and with great accuracy because machines are faster and more accurate than humans.
4. Learning Analysis (LA)
Learning is the product of an interaction between learners and the learning environment,
between among students / educators / teachers and others (Elias & Lias, 2011).
The evaluation of learning, in the traditional sense, is based on the evaluation of student /
pupil outcomes. This involves assessing knowledge but also trying to answer questions such as:
how well this student needs, how can be improved, how to change the course interface to make it
more accessible. At present, especially in the pre-university system, learning evaluation is based on
questionnaires. Obtaining feed-back is lasting because the non-automatic data processing takes time
and the analysis possibilities are quite limited.
The desire to improve the quality of learning and assessment in the educational system is
increasing at the international level, but also in our country. Traditional systems are confronted by
huge amounts of data and their diversity. Learning Analytics (LA) attempts to answer questions
about how this data can be used and how it can be transformed and analyzed to provide useful
information that can give value to the learning process (Liu & Fan, 2014).
In 2011, at the first International Conference on Learning A…
Purchase answer to see full
attachment
Economic Debate- Progressive Income Tax For this Economic Debate, we are going to discuss the…
TOPIC: Going Global Discussion Thread 1 (initial post due Wednesday for full credit) Please note:…
Assignment Topic This week will culminate in the creation of a narrated PowerPoint to create…
The Assignment must be submitted on Blackboard (WORD format only) via allocated folder. Assignments submitted…
you need to post your 2-page information flier to share with your Final Project Group.…
discussion: Discuss the methods used at your company to measure and ensure quality products and…