Predictive Maintenance With Machine Learning
Vasilev, Vladimir (2026)
Vasilev, Vladimir
2026
All rights reserved. This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:amk-202605069603
https://urn.fi/URN:NBN:fi:amk-202605069603
Tiivistelmä
Predictive maintenance is a difficult task in many cases. Building a model to predict whether a machine is going to fail involves several challenges, such as limited availability of open-source datasets, severe class imbalance where failures are rare events, and complex relationships between machine operating conditions. This study compared five machine learning models for predicting equipment failures using structured telemetry data.
The dataset acquired in this study included temperature, rotational speed, torque and tool wear measurements. Before training, the data was cleaned by removing anomalies, resampled using SMOTENC to increase the failure class proportion from 3.4% to 20%, and prepared through encoding and scaling. Five models were trained and compared: Logistic Regression, K-Nearest Neighbours, Support Vector Classifier, Random Forest and XGBoost. All models were tuned using GridSearchCV and evaluated with F2-score as the primary metric because missing a real failure is more costly than a false alarm.
Results proved clear differences in how well each model handled failure detection in imbalanced data. Some models struggled with rare failure cases while others demonstrated strong ability to identify upcoming malfunctions. Feature importance analysis revealed which operational variables impacted the most to predictions. The study also explored how data preprocessing decisions such as resampling strategy and threshold selection influenced final model performance. Based on these findings, recommendations for future development and real-world deployment were proposed.
The dataset acquired in this study included temperature, rotational speed, torque and tool wear measurements. Before training, the data was cleaned by removing anomalies, resampled using SMOTENC to increase the failure class proportion from 3.4% to 20%, and prepared through encoding and scaling. Five models were trained and compared: Logistic Regression, K-Nearest Neighbours, Support Vector Classifier, Random Forest and XGBoost. All models were tuned using GridSearchCV and evaluated with F2-score as the primary metric because missing a real failure is more costly than a false alarm.
Results proved clear differences in how well each model handled failure detection in imbalanced data. Some models struggled with rare failure cases while others demonstrated strong ability to identify upcoming malfunctions. Feature importance analysis revealed which operational variables impacted the most to predictions. The study also explored how data preprocessing decisions such as resampling strategy and threshold selection influenced final model performance. Based on these findings, recommendations for future development and real-world deployment were proposed.
