Machine learning methods vs. traditional methods in forecasting loss reserves
Kotsalo, Niitta (2021)
All rights reserved. This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Julkaisun pysyvä osoite on
The purpose and topic of the study is to investigate can machine learning methods provide better estimations of loss reserves compared to traditional chain-ladder method. The aim is to provide the commissioning organisation own automated tools to create the prediction without expert knowledge and reduce manual work of future data analysis. The study is aiming to answer can machine learning method predict outstanding loss reserves and ultimate claims to be paid and can the ML method create better estimates or not. The limitation in insurance industry is that creating estimations of the loss reserves can be challenging due to the random instabilities of claims data. The data used in the research turned out to be smaller than expected, is highly imbalanced and biased, therefore forecasting errors are possible. The dataset includes real-life individual claims data collected by the commissioning company. Quantitative research methods are applied. The ML algorithms used are linear ridge regression and traditional chain-ladder to predict the loss reserves. Logistic regression and random forest for multiclassification are trained to predict the development delay. Model accuracy to actual results and AUC are used to evaluate the models. New research applying modern machine learning methods to address the loss reserving problem is reviewed, and the framework for chain-ladder theory by Mack (1994) presented. It is found that chain-ladder method is simple and can provide accurate predictions on ultimate claims data. Ridge regression results were inaccurate to make predictions from the data. It was able to provide individual claims predictions, therefore the method was not directly comparable with chain-ladder. Logistic regression was able to provide the best result to predict the development delay. In conclusion, the chain-ladder is accurate and easy to apply into actual usage. The machine learning methods can bring new insight from the data as they consider more variables. Data automation and collection of more historical data to make predictions is recommended for future.