Sensitivity of predictive performance assessment accuracy in varying k-fold cross validation
Kjeldsberg, Fabian; Munim, Ziaul Haque; Bustgaard, Morten; Bhagat, Sahil; Lindroos, Emilia; Haavardtun, Per (2025)
Kjeldsberg, Fabian
Munim, Ziaul Haque
Bustgaard, Morten
Bhagat, Sahil
Lindroos, Emilia
Haavardtun, Per
Editoija
Kim, Tae Eun
Milrad, Marcelo
Remolar, Inmaculada
Springer Nature
2025
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi-fe202601092525
https://urn.fi/URN:NBN:fi-fe202601092525
Tiivistelmä
In machine learning (ML) applications, cross-validation (CV) allows greater generalizability of a trained algorithm over out-of-sample or new data.This study explores the accuracy of trained ML algorithms in predicting student performance in a maritime simulator exercise scenario in four different k-fold CVs. Three, five, eight, and ten-fold CVs were trained using a cloud-ML platform. Three top-performing ML algorithms were evaluated considering log loss, accuracy, and area under the curve (AUC). The results indicate higher predictive accuracy with increasing k in CV folds. Considering the trade-off between prediction accuracy and the time required to predict every 1000 observations, using the five-fold CV in predictive learning analytics appears optimal in the explored simulation training scenario. Prediction explanations of five-fold CV are reported.
