Net sales prediction : azure machine learning
Roy, Sayak (2024)
Roy, Sayak
2024
All rights reserved. This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:amk-2024052817260
https://urn.fi/URN:NBN:fi:amk-2024052817260
Tiivistelmä
This Master’s Thesis is a project-based thesis about Sales Prediction using Microsoft Azure Machine learning models for a telecommunication and technology company in Finland. The company has been following a semi-manual Sales forecasting procedure for many years. The purpose of the project was to evaluate the accuracy of the machine learning models in Microsoft Azure and their ability to forecast future sales. The work in the thesis is important because it provides a direction for the company to automate its sales forecast method with machine learning.
In this thesis, Azure machine learning algorithms were tested, first with public dataset and then with company dataset to determine whether they provide similar results. The public dataset for Walmart sales was collected from Kaggle, whereas the company historical sales was provided by the company after the data was scrambled. Both sample and training data consisted of actual historical sales for different departments and products. The data was pre-processed in SQL databases and was used as training datasets for Azure machine learning models.
An Azure machine learning workspace was set up in company environment where training data was uploaded. Machine learning workspace was set up and multiple machine learning models (both regression and time series) were trained using Kaggle (Walmart sales) data. The machine learning models (after training) were fed with test dataset on which the model accuracy was measured using 'Normalized Root Mean Squared Error' metric. The best performing models were then isolated and were trained with the company specific data. At the end, the top performing machine learning models were identified and listed. The results were compared with the existing studies using same Kaggle dataset and verified that if the top performing models are similar.
Two algorithms, XGBoost and Random Forest provided the best Normalized Root Mean squared Error metric and were listed as the top performing models in case of both Kaggle and company dataset and were marked as the candidates to be worked on, in future for the company. The results of the experiments provide a baseline for generating predictions for company sales using Azure machine learning models and can be carried forward further in the desired direction.
Abbreviations can be found at the end of the thesis. It is recommended that the reader has it available when reading the thesis, which can be helpful in understanding all the terms used in the text.
In this thesis, Azure machine learning algorithms were tested, first with public dataset and then with company dataset to determine whether they provide similar results. The public dataset for Walmart sales was collected from Kaggle, whereas the company historical sales was provided by the company after the data was scrambled. Both sample and training data consisted of actual historical sales for different departments and products. The data was pre-processed in SQL databases and was used as training datasets for Azure machine learning models.
An Azure machine learning workspace was set up in company environment where training data was uploaded. Machine learning workspace was set up and multiple machine learning models (both regression and time series) were trained using Kaggle (Walmart sales) data. The machine learning models (after training) were fed with test dataset on which the model accuracy was measured using 'Normalized Root Mean Squared Error' metric. The best performing models were then isolated and were trained with the company specific data. At the end, the top performing machine learning models were identified and listed. The results were compared with the existing studies using same Kaggle dataset and verified that if the top performing models are similar.
Two algorithms, XGBoost and Random Forest provided the best Normalized Root Mean squared Error metric and were listed as the top performing models in case of both Kaggle and company dataset and were marked as the candidates to be worked on, in future for the company. The results of the experiments provide a baseline for generating predictions for company sales using Azure machine learning models and can be carried forward further in the desired direction.
Abbreviations can be found at the end of the thesis. It is recommended that the reader has it available when reading the thesis, which can be helpful in understanding all the terms used in the text.