Sales Forecasting for a Global Superstore Using Time Series and Machine Learning Models
Lu, Min (2026)
Lu, Min
2026
All rights reserved. This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:amk-2026052616796
https://urn.fi/URN:NBN:fi:amk-2026052616796
Tiivistelmä
This study is based on the Global Superstore dataset and conducts a comparative analysis of the
performance of traditional time series models and machine learning models in retail sales fore
casting. The study adopts the CRISP-DM methodology and evaluates the predictive performance of
different models based on two time granularities: daily sales and weekly sales.
The models used in the research include ARIMA, Holt-Winters exponential smoothing model, lin
ear regression, random forest, XGBoost, and artificial neural network (ANN). For the machine
learning models, the study further constructed feature engineering methods such as temporal fea
tures, lag features, and rolling features. The performance of the models was evaluated using mean
absolute error (MAE), root mean square error (RMSE), coefficient of determination (R²), explained
variance score, and maximum error.
The study results show that in both daily sales forecasting and weekly sales forecasting, the linear
regression model using the complete feature set (Full Features) performed the best overall, while
more complex models did not significantly outperform the simple models. Compared to daily sales
forecasting, weekly sales forecasting is more stable overall due to the reduction of noise caused by
time aggregation. However, all models still have obvious limitations in predicting abnormal sales
peaks and extreme fluctuations. The research results indicate that in retail sales scenarios with
limited data scale and significant fluctuations, models with relatively simple structures and strong
interpretability can still provide certain practical predictive value.
performance of traditional time series models and machine learning models in retail sales fore
casting. The study adopts the CRISP-DM methodology and evaluates the predictive performance of
different models based on two time granularities: daily sales and weekly sales.
The models used in the research include ARIMA, Holt-Winters exponential smoothing model, lin
ear regression, random forest, XGBoost, and artificial neural network (ANN). For the machine
learning models, the study further constructed feature engineering methods such as temporal fea
tures, lag features, and rolling features. The performance of the models was evaluated using mean
absolute error (MAE), root mean square error (RMSE), coefficient of determination (R²), explained
variance score, and maximum error.
The study results show that in both daily sales forecasting and weekly sales forecasting, the linear
regression model using the complete feature set (Full Features) performed the best overall, while
more complex models did not significantly outperform the simple models. Compared to daily sales
forecasting, weekly sales forecasting is more stable overall due to the reduction of noise caused by
time aggregation. However, all models still have obvious limitations in predicting abnormal sales
peaks and extreme fluctuations. The research results indicate that in retail sales scenarios with
limited data scale and significant fluctuations, models with relatively simple structures and strong
interpretability can still provide certain practical predictive value.
