Comparative Analysis of Clustering Techniques for Stock Selection in Finnish Stock Markets Using Common Financial Metrics
Rautio, Timo (2024)
Rautio, Timo
2024
All rights reserved. This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:amk-2024060219855
https://urn.fi/URN:NBN:fi:amk-2024060219855
Tiivistelmä
This thesis aimed to study the efficacy of unsupervised machine learning techniques, specifically clustering algorithms, in the context of stock selection. The goal was to find out are the clustering methods suitable for stock selection and whether they were able to outperform explicit rule-based strategies utilizing favourable financial ratios. Additionally, this study aimed to examine the generalizability of clustering results for profitable, risk-tolerant investment decisions.The dataset reflected the state of the markets on the final trading day of 2022 and included some standard financial metrics, such as return on equity, price-to-earnings ratio, price-to-book ratio, debt-to-equity ratio, dividend yield, earnings yield, and earnings per share. Removing missing values and outlier removal were applied as pre-processing measures, and the dataset was fed to different clustering algorithms including K-Means, Hierarchical clustering, and Gaussian Mixture Model. These clustering methods were selected due their distinct clustering approach and the ability to specify the number of clusters. The clustering results were compared using internal evaluation methods including silhouette score, Davies Boulding index, and Dunn index. Results were also analysed using annual price fluctuation between 2022 and 2023.The result of this research indicated that Hierarchical clustering outperforms the K-Means and GMM based on internal evaluation methods measuring the compactness and separation of clusters. However, the differences were not notably different and rather marginal. Analysis of the best-performing cluster revealed an average annual stock price growth of 36.76%, but as an investment strategy, it presented a higher risk compared to the approach based on favourable financial metrics. These findings suggested that while hierarchical clustering can offer the best performance in some cases, its higher risk profile may limit its usage in investment strategies. As the results indicated a poor suitability of using clustering algorithms used alone to stock selection, future research should explore different clustering algorithms like DBSCAN and neural network models. The future research should also focus on longer-term analysis to gain better understanding over economic cycles as well to study and take in consideration the industry-specific differences.