Using Machine Learning Models to Predict the Study Path Selection of Business Information Technology Students
Saballe, Charlese Adriana (2019)
Saballe, Charlese Adriana
2019
All rights reserved. This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Julkaisun pysyvä osoite on 
https://urn.fi/URN:NBN:fi:amk-2019052913099
https://urn.fi/URN:NBN:fi:amk-2019052913099
Tiivistelmä
Educational data mining (EDM) is an emerging field of research that puts into effective use advanced machine learning concepts in analysing numerous data from educational systems to improve the methods and tools in learning and teaching in educational institutions.
In line with the Finnish government’s Vision 2030 for higher education which hopes to develop more pre-emptive and anticipation-led digital learning services, this thesis aims to explore the use of machine learning techniques to find accurate prediction and classification models of the two most common study path selection of students in the Degree Programme in Business IT of Haaga-Helia UAS using features such as student related characteristics and affinity to Software Development (SWD) and Digital Service Design (DSD), motivational goal, mastery orientation, and other demographic factors.
This quantitative research utilized questionnaire data gathered from 101 students in the BITe programme and followed the CRISP-DM framework as a guide to move forward to the various phases of the research. KNIME Analytics Platform, an open-source data mining tool, was used to pre-process, prepare, analyse and model the data.
Exploratory data analysis was undertaken to discover initial insights about the data and to trim the chosen factors. Bootstrap method was used as a sample-growing technique and data were partitioned into training (80%) and testing (20%) subsets. Three machine learning algorithms were used to model the data, and performance scores (accuracy, Cohen’s kappa and ROC curve) were set as criteria to evaluate the models.
Results of the study revealed that the research was successful in establishing a predictive model using logistic regression. Using the significant predictors DSD & SWD factors, mastery intrinsic orientation, motivational goal, gender, geographical area and age, the model was able to predict study path selection with 85.5% accuracy. The validation test done on the model even achieved a higher accuracy score of 94%.
The research was also able to forward two highly accurate Random Forest (94% accuracy) and Decision Tree (93% accuracy) classification models. Due to only very slight differences between the performance measures of these models, both are recommended to be used for student classification. The Random Forest would result in a slightly higher accuracy rate while the Decision Tree model would be easier to interpret by extracting a classification rule from its tree view.
Model deployment was simulated in KNIME and the final models were exported in PMML format, thus opening the possibility for the models to be used in future researches or for the deployment in a study path recommender application for incoming students.
In line with the Finnish government’s Vision 2030 for higher education which hopes to develop more pre-emptive and anticipation-led digital learning services, this thesis aims to explore the use of machine learning techniques to find accurate prediction and classification models of the two most common study path selection of students in the Degree Programme in Business IT of Haaga-Helia UAS using features such as student related characteristics and affinity to Software Development (SWD) and Digital Service Design (DSD), motivational goal, mastery orientation, and other demographic factors.
This quantitative research utilized questionnaire data gathered from 101 students in the BITe programme and followed the CRISP-DM framework as a guide to move forward to the various phases of the research. KNIME Analytics Platform, an open-source data mining tool, was used to pre-process, prepare, analyse and model the data.
Exploratory data analysis was undertaken to discover initial insights about the data and to trim the chosen factors. Bootstrap method was used as a sample-growing technique and data were partitioned into training (80%) and testing (20%) subsets. Three machine learning algorithms were used to model the data, and performance scores (accuracy, Cohen’s kappa and ROC curve) were set as criteria to evaluate the models.
Results of the study revealed that the research was successful in establishing a predictive model using logistic regression. Using the significant predictors DSD & SWD factors, mastery intrinsic orientation, motivational goal, gender, geographical area and age, the model was able to predict study path selection with 85.5% accuracy. The validation test done on the model even achieved a higher accuracy score of 94%.
The research was also able to forward two highly accurate Random Forest (94% accuracy) and Decision Tree (93% accuracy) classification models. Due to only very slight differences between the performance measures of these models, both are recommended to be used for student classification. The Random Forest would result in a slightly higher accuracy rate while the Decision Tree model would be easier to interpret by extracting a classification rule from its tree view.
Model deployment was simulated in KNIME and the final models were exported in PMML format, thus opening the possibility for the models to be used in future researches or for the deployment in a study path recommender application for incoming students.
