Predicting the Performance of Football Teams in the Italian Serie A League Using Neural Networks and Machine Learning Methods through Historical Data Analysis
Amininiaki, Masoud (2024)
Amininiaki, Masoud
2024
All rights reserved. This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:amk-2024060722050
https://urn.fi/URN:NBN:fi:amk-2024060722050
Tiivistelmä
The aim of this thesis is designing several prediction models based on machine learning algorithms and neural networks methods to predict the performance of football teams, their final ranking and the champion of Serie A football league based on the available historical data of the football matches. In this regard, seven predictive models were created in Jupyter Notebook which is an environment based on Python programming language. The models were analytically and comprehensively compared in the case of errors and accuracies. So the best predictive model was introduced. The ranking provided by the betting websites was compared with the ranking of the teams based on the predictive models of this thesis. The importance of the variables used in the predictive models on the target variable was determined and it was defined which feature has the greatest impact on the prediction models.
At the first step in the modeling, the data required for use in the predictive models was collected from different sources and entered in two csv files. One of them is for the data of seven past football leagues and another includes the data of the current Serie A football league (2023-2024). After data processing, seven prediction models which are Linear Regression (LR), Decision Tree (DT), Gradient Boosting Machine (GBM), Neural Networks (NN), Random Forest (RF), Support Vector Machine (SVM) and Extreme Gradient Boosting (XGB) based on the machine learning and neural networks methods were designed in Python. These models were validated after training and testing. The method of validation used in this research was 5 fold cross-validation. The outcomes of these seven prediction models were compared analytically. Also, the results related to the errors (Mean Absolute Error and Mean Squared Error) and the accuracy criteria for each model were presented in appropriate tables and graphs.
After comparing the analysis of the predictive models and presenting detailed examination of each predictive model, it was found that the LR, GBM, RF, DT and XGB models are suitable models for prediction the performance of football teams and they have been able to correctly predict the team ranking of the current league more than the other models. The most important feature was average goals per game. Among those 5 suitable models, GBM and Linear Regression had the most accurate predictions.
At the first step in the modeling, the data required for use in the predictive models was collected from different sources and entered in two csv files. One of them is for the data of seven past football leagues and another includes the data of the current Serie A football league (2023-2024). After data processing, seven prediction models which are Linear Regression (LR), Decision Tree (DT), Gradient Boosting Machine (GBM), Neural Networks (NN), Random Forest (RF), Support Vector Machine (SVM) and Extreme Gradient Boosting (XGB) based on the machine learning and neural networks methods were designed in Python. These models were validated after training and testing. The method of validation used in this research was 5 fold cross-validation. The outcomes of these seven prediction models were compared analytically. Also, the results related to the errors (Mean Absolute Error and Mean Squared Error) and the accuracy criteria for each model were presented in appropriate tables and graphs.
After comparing the analysis of the predictive models and presenting detailed examination of each predictive model, it was found that the LR, GBM, RF, DT and XGB models are suitable models for prediction the performance of football teams and they have been able to correctly predict the team ranking of the current league more than the other models. The most important feature was average goals per game. Among those 5 suitable models, GBM and Linear Regression had the most accurate predictions.