Real-time Prediction of Flight Delays : Neural Networks vs Statistical Methods
Torres Rodriguez, Iraia (2025)
Torres Rodriguez, Iraia
2025
All rights reserved. This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:amk-2025082124067
https://urn.fi/URN:NBN:fi:amk-2025082124067
Tiivistelmä
This thesis investigates the real-time prediction of commercial flight delays by comparing machine learning and statistical time series models.
The goal is to determine which modelling approach performs better in forecasting delays and under what conditions. Utilizing historical flight data from Brussels Airport from 2023, along with external weather and passenger traffic data. A wide range of models were generated for short-term, long-term, and route-specific predictions.
The study implements a comprehensive end-to-end pipeline, including data collection, preprocessing, feature engineering, and model training. Statistical models such as SARIMAX and Rolling Forecast were evaluated for their ability to capture trends and seasonality, while machine learning models, including Extreme Gradient Boosting (XGBoost), Gradient Boosting Machines (GBM), Random Forest, Linear Regression, and Long Short-Term Memory (LSTM) Neural Networks, were assessed for their flexibility and predictive accuracy.
Results indicate that machine learning models, particularly XGBoost, exceed statistical models in accuracy and adaptability, especially when managing complex relationships and external variables like weather and traffic. However, statistical models like Rolling Forecast proved to be effective for short-term predictions as they predict delays step by step by continuously adjusting themselves and therefore make it more responsive to changes.
The thesis also proposes an automated workflow using cron jobs for weekly data updates, model retraining, and delay forecasting. This would result in dynamic, up-to-date predictions without manual intervention, making it a scalable solution for real-world applications in aviation analytics.
The goal is to determine which modelling approach performs better in forecasting delays and under what conditions. Utilizing historical flight data from Brussels Airport from 2023, along with external weather and passenger traffic data. A wide range of models were generated for short-term, long-term, and route-specific predictions.
The study implements a comprehensive end-to-end pipeline, including data collection, preprocessing, feature engineering, and model training. Statistical models such as SARIMAX and Rolling Forecast were evaluated for their ability to capture trends and seasonality, while machine learning models, including Extreme Gradient Boosting (XGBoost), Gradient Boosting Machines (GBM), Random Forest, Linear Regression, and Long Short-Term Memory (LSTM) Neural Networks, were assessed for their flexibility and predictive accuracy.
Results indicate that machine learning models, particularly XGBoost, exceed statistical models in accuracy and adaptability, especially when managing complex relationships and external variables like weather and traffic. However, statistical models like Rolling Forecast proved to be effective for short-term predictions as they predict delays step by step by continuously adjusting themselves and therefore make it more responsive to changes.
The thesis also proposes an automated workflow using cron jobs for weekly data updates, model retraining, and delay forecasting. This would result in dynamic, up-to-date predictions without manual intervention, making it a scalable solution for real-world applications in aviation analytics.