ML-Driven Predictive Alerting and Dashboard Development for Cloud-Ops Monitoring
Tamang Bomjan, Prasanna (2025)
Tamang Bomjan, Prasanna
2025
All rights reserved. This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:amk-2025120833582
https://urn.fi/URN:NBN:fi:amk-2025120833582
Tiivistelmä
Contemporary cloud environments generate large volumes of data and metrics across services, systems, and infrastructure. Continuous and effective monitoring is fundamental to supplementing cloud operations automation, maintaining reliability, and optimising resource utilization. Nokia, being one of the leading global companies in the telecommunication and cloud services domain, requires constant monitoring and analysis of operational data. To address this need, the AMPS (AI/ML Platform Services) team in Nokia has adopted Grafana, an open-source platform known for its visualization and analytics capabilities.
This thesis is structured around three primary objectives. The first focuses on collecting data and metrics from heterogeneous data sources to build dashboards for comprehensive visualization. The second emphasises implementing predictive alerting by leveraging historical data to forecast potential issues and threshold breaches, transitioning from reactive to proactive monitoring. The third involves exploration of the feasibility of developing custom Grafana panel plugin to visualise the outcomes of the predictive alerting model.
The theoretical framework discusses observability and data visualization as fundamentals for monitoring cloud systems, underlining how they provide deeper insights into performance, reliability, and security. It also explores the role of Grafana in building interactive dashboards that support data visualization and analytics. Likewise, it investigates custom panel development in Grafana. Furthermore, it reviews machine learning and predictive alerting, discussing relevant technologies.
The empirical part of the thesis focuses on the practical implementation of predictive alerting using historical metrics as well as the development of dashboards with Grafana. The final product of this thesis is a set of Grafana dashboards developed for the AMPS team, and predictive alerting to enhance observability in their cloud operations.
This thesis is structured around three primary objectives. The first focuses on collecting data and metrics from heterogeneous data sources to build dashboards for comprehensive visualization. The second emphasises implementing predictive alerting by leveraging historical data to forecast potential issues and threshold breaches, transitioning from reactive to proactive monitoring. The third involves exploration of the feasibility of developing custom Grafana panel plugin to visualise the outcomes of the predictive alerting model.
The theoretical framework discusses observability and data visualization as fundamentals for monitoring cloud systems, underlining how they provide deeper insights into performance, reliability, and security. It also explores the role of Grafana in building interactive dashboards that support data visualization and analytics. Likewise, it investigates custom panel development in Grafana. Furthermore, it reviews machine learning and predictive alerting, discussing relevant technologies.
The empirical part of the thesis focuses on the practical implementation of predictive alerting using historical metrics as well as the development of dashboards with Grafana. The final product of this thesis is a set of Grafana dashboards developed for the AMPS team, and predictive alerting to enhance observability in their cloud operations.
