Predicting budget adequacy: a case study for Kieku data
Väisänen, Teemu (2023)
Väisänen, Teemu
2023
All rights reserved. This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:amk-2023121236515
https://urn.fi/URN:NBN:fi:amk-2023121236515
Tiivistelmä
Machine learning (ML) has gained popularity in the recent years due to increasing availability in computing power and tools such as ChatGPT. This created more interest in ML and companies started to invest in the related technologies. As interest increased additional exploration was done to find out where ML could be implemented. This eventually led to people wanting to predict future events such as engineering malfunctions and stock prices.
In the financial field there are multiple different use cases as well but in this study the focus is on budget adequacy prediction. The goal was to use ML algorithms to predict whether the given budget covers the costs of each tracking subject to the end of the year. The data that was available was the monthly expenses, salaries, and budget for the subject. Since that was the data that was being gathered for operational usage, it was theorized that it could be used for ML to create predictions. This ML model was in turn meant to be used by the financial department as a way of keeping track of each budgeted monitoring subject so that in case some of them were in danger of going over budget. The department could then react before the budget was exceeded. The study is done to see if the data could be used to train a ML model that would be able to learn and predict budget behavior patterns and would be able to create accurate predictions based on them.
The implementation takes advantage open-source software, python libraries and Microsoft Azure cloud environment. The data comes from the government HR and financial system Kieku. It has been in use since 2016 and the data is readily available for research purposes. After evaluating multiple ML algorithms, an artificial neural network (ANN), or a random forest classifier (RFC) was found to be accurate for this purpose. In the testing phase ANN model reached 96% and the RFC model reached 99% accuracy. Further testing should be done with production data to see if the difference in production data and historical data makes a difference in the models’ accuracy.
Improvements to the underlaying data quality are presented as it could improve the machine learning models accuracy and overall performance. In addition, a visualization of the predictions for end users would also bring added benefits to the process as visual representations tend to be easier to read.
In the financial field there are multiple different use cases as well but in this study the focus is on budget adequacy prediction. The goal was to use ML algorithms to predict whether the given budget covers the costs of each tracking subject to the end of the year. The data that was available was the monthly expenses, salaries, and budget for the subject. Since that was the data that was being gathered for operational usage, it was theorized that it could be used for ML to create predictions. This ML model was in turn meant to be used by the financial department as a way of keeping track of each budgeted monitoring subject so that in case some of them were in danger of going over budget. The department could then react before the budget was exceeded. The study is done to see if the data could be used to train a ML model that would be able to learn and predict budget behavior patterns and would be able to create accurate predictions based on them.
The implementation takes advantage open-source software, python libraries and Microsoft Azure cloud environment. The data comes from the government HR and financial system Kieku. It has been in use since 2016 and the data is readily available for research purposes. After evaluating multiple ML algorithms, an artificial neural network (ANN), or a random forest classifier (RFC) was found to be accurate for this purpose. In the testing phase ANN model reached 96% and the RFC model reached 99% accuracy. Further testing should be done with production data to see if the difference in production data and historical data makes a difference in the models’ accuracy.
Improvements to the underlaying data quality are presented as it could improve the machine learning models accuracy and overall performance. In addition, a visualization of the predictions for end users would also bring added benefits to the process as visual representations tend to be easier to read.