Detecting Fake News on Social Media: a data mining perspective- exploring machine learning
Obajimi, Adewunmi George (2025)
Obajimi, Adewunmi George
2025
All rights reserved. This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:amk-2025051411483
https://urn.fi/URN:NBN:fi:amk-2025051411483
Tiivistelmä
The widespread spread of false information seriously threatens the quality of information and the stability of society. This paper looked at how machine learning techniques could be used to find fraudulent information on online channels. A balanced dataset from Kaggle comprises 51,063 entries, categorised as real news (24,563) and fake news (26,500). Many supervised machine learning algorithms—including logistic regression, random forest, and gradient boosting—were used to build and evaluate the model. Text normalisation, tokenisation, and feature extraction using the Term Frequency-Inverse Document Frequency (TFIDF) technique constituted the data preparation stage. In the data there is class balance therefore guaranteeing strong model performance. The assessment of the model used several measures: accuracy, precision, recall, F1-score, and AUC-ROC. According to the results, Logistic Regression was the best-performing model with an F1 score of 0.959 and a precision of 96.1 percent. Among other visualisation techniques, confusion matrices, bar charts, and metric comparisons improved the clarity of the model projections. By offering scalable solutions appropriate for real-world situations, this study underlined the possibility of machine learning to solve the growing problem of disinformation. Future studies should concentrate on combining transformers with deep learning architectures to improve contextual analysis and expand the range to cover multilingual datasets, hence increasing applicability.
Keywords
Fake News Detection, Machine Learning, Gradient Boosting, Supervised Learning, Natural Language Processing, TF-IDF, Digital Misinformation, Text Classification, Data Preprocessing.
Keywords
Fake News Detection, Machine Learning, Gradient Boosting, Supervised Learning, Natural Language Processing, TF-IDF, Digital Misinformation, Text Classification, Data Preprocessing.