News Articles Topic Classification Using Transformer Model : An Interactive AI Application
Shrestha, Bishal Ram (2025)
Shrestha, Bishal Ram
2025
All rights reserved. This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:amk-2025121135215
https://urn.fi/URN:NBN:fi:amk-2025121135215
Tiivistelmä
This thesis presents an interactive AI system to make an automatic classification of news articles in twelve categories having a headline and a description as input. The system assists editors in making quick editorial decisions. It is important to categorise news articles in the workflow before publishing. It helps in taking editorial decisions fast and clear. A balanced dataset of synthetic news samples covering all twelve categories were generated using the GPT-Neo 2.7B model. The dataset has information related to World, Politics, Business, Technology and Science which supports consistent representation of news patterns.
The dataset was fine tuned using RoBERTa base transformer model. The accuracy, macro precision, macro recall and macro F1 score were used in the evaluation to ensure that the performance is fair across the twelve categories. The model showed an accuracy of 51.87 percent and has a macro F1-score of 0.52 on the unseen test data which shows that the model is able to identify more than half of the articles correctly and performance is not skewed over the different categories.
A Gradio web interface was used to show how the model would work in real life. The Application suggests the predicted topic, confidence score and routing suggestion as per the confidence threshold. When the prediction is uncertain or below the given threshold, the system suggests manual review for editors selection. The tool wil be helpful for editorial judgement and not totally automated.
Transformer-based models can help editors in their daily news editing tasks by improving efficiency and creating uniformity in classifying news articles as demonstrated by this project. Despite some limitations, especially for categories with similar wording, the system proves that transformer-based model can provide consistent and good news classification in an editorial workflow.
The dataset was fine tuned using RoBERTa base transformer model. The accuracy, macro precision, macro recall and macro F1 score were used in the evaluation to ensure that the performance is fair across the twelve categories. The model showed an accuracy of 51.87 percent and has a macro F1-score of 0.52 on the unseen test data which shows that the model is able to identify more than half of the articles correctly and performance is not skewed over the different categories.
A Gradio web interface was used to show how the model would work in real life. The Application suggests the predicted topic, confidence score and routing suggestion as per the confidence threshold. When the prediction is uncertain or below the given threshold, the system suggests manual review for editors selection. The tool wil be helpful for editorial judgement and not totally automated.
Transformer-based models can help editors in their daily news editing tasks by improving efficiency and creating uniformity in classifying news articles as demonstrated by this project. Despite some limitations, especially for categories with similar wording, the system proves that transformer-based model can provide consistent and good news classification in an editorial workflow.
