Predictive auto-scaling in Kubernetes using ARIMA time series forecasting

Bhatnagar, Vasu Swarup

Predictive auto-scaling in Kubernetes using ARIMA time series forecasting

Bhatnagar, Vasu Swarup (2026)

Avaa tiedosto

Bhatnagar_Vasu.pdf (838.6Kt)

Lataukset:

Bhatnagar, Vasu Swarup

2026

Näytä kaikki kuvailutiedot

Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:amk-2026051211343

Tiivistelmä

Kubernetes has become the industry standard for container orchestration, but its default autoscaling mechanism, the Horizontal Pod Autoscaler operates re-actively, scaling only after resource thresholds have already been breached. This delay between demand and available capacity, commonly known as the cold start problem, leads to a poor application performance whenever traffic spikes.

This thesis proposes a relatively better approach and implements a predictive auto-scaling system for Kubernetes using the ARIMA (AutoRegressive Inte-grated Moving Average) time series forecasting model. The system forecasts CPU utilisation up to 45 seconds ahead and proactively scales deployments before resource exhaustion occurs, addressing the fundamental lag inherent in reactive approaches.

The system was implemented on a local Kubernetes cluster generated using kind, with a FastAPI application serving as the target workload which depended solely on the CPU. An in-cluster load generator pod simulated realistic traffic patterns across ramp-up and spike stages. CPU metrics were collected every 15 seconds to train the ARIMA model, with an automated grid search identifying ARIMA(0,2,1) as the optimal parameter order with an AIC of 596.89.

Experimental results demonstrated that the autoscaler successfully scaled from 1 to 8 replicas during a load spike, with the first scale-up triggered at 224 mil-licores of observed CPU which is well before the 400 millicore threshold based on a forecast predicting 502 millicores within 45 seconds ahead. The system subsequently scaled back down to 1 replica as load subsided, releasing all un-necessary resources. The four technical goals of the thesis: cold start elimina-tion, resource efficiency, reduced manual intervention, and portability, were evaluated against the experimental results, with three goals fully met and one partially met.

The results validate the predictive approach and demonstrate that ARIMA-based forecasting is a computationally lightweight and effective mechanism for proactive auto-scaling in Kubernetes environments. Limitations of the approach include occasional over-provisioning due to linear trend extrapolation, negative forecast values during cooldown, and the use of CPU as the sole scaling metric.

Kokoelmat

Opinnäytetyöt (Avoin kokoelma)