Classifying news articles based on user needs using transfer learning and deep neural networks: a multi-class approach combining BERT with non-textual features
Kuuluvainen, Elina (2023)
Kuuluvainen, Elina
2023
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:amk-2023121838120
https://urn.fi/URN:NBN:fi:amk-2023121838120
Tiivistelmä
This thesis describes the procedure for developing a machine learning model that can categorise news articles based on audience insight experts' assessments of user needs. This study aims to determine if it's possible to employ a machine learning algorithm for identifying user needs within the context of Yle News. The research method involves fine-tuning a pre-trained large language model, BERT, by making use of both textual and non-textual features of news articles. The data is collected from Yle’s data warehouse and combined with manually labelled user needs classes. Python, along with the PyTorch and Hugging Face Transformers libraries, is used in implementing the model. The results show the model can effectively categorise articles into user needs groups with a relevantly appropriate level of accuracy. However, limitations arise from the small dataset size and its uneven distribution across various user needs categories. Based on the results, the trained model might not be well generalisable to unseen data. The research concludes by suggesting avenues for improvement and application, such as acquiring a more balanced dataset while exploring alternative deep learning models. In summary, this study showcases how machine learning algorithms hold potential for classifying news articles based on user needs; nevertheless, further refinement and development are required for future use cases.