Performance Analysis of ENSFM Recommendation Systems on Modern E-Commerce Datasets
Nguyen, Linh (2025)
Nguyen, Linh
2025
All rights reserved. This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:amk-2025050910308
https://urn.fi/URN:NBN:fi:amk-2025050910308
Tiivistelmä
Recommender systems are essential tools that analyze user behaviors, preferences, or characteristics to provide tailored suggestions. Various recommender methods exist, such as collaborative filtering, content-based filtering, hybrid models, and context-aware techniques. However, efficiently deploying any of these on extremely large-scale datasets remains a major challenge. This thesis addresses the research gap by investigating the application of Efficient Non-Sampling Factorization Machines (ENSFM) to large-scale, context-rich datasets. Specifically, the thesis evaluates the feasibility of training the ENSFM model using realworld datasets that are significantly larger and more complex than the original benchmark datasets.
The research primarily utilizes Efficient Non-Sampling Factorization Machines (ENSFM), an advanced factorization model optimized to handle large-scale implicit datasets effectively without relying on negative sampling. Experiments were conducted using two real-world datasets—Yelp2018 and Amazon Book Reviews 2023—after rigorous preprocessing steps, including data cleaning, re-indexing of user and item IDs, and
leave-one-out splitting for training and evaluation purposes.
The experimental results indicate that the ENSFM model demonstrates limited effectiveness when applied to multi-category datasets such as Yelp2018 and Amazon Book Reviews 2023. Specifically, the model struggles with accuracy issues, long computational times, high computational costs, demanding hardware requirements, and poor scalability. These limitations highlight the necessity of further optimization and adaptation of ENSFM for handling complex, multi-category real-world datasets.
The research primarily utilizes Efficient Non-Sampling Factorization Machines (ENSFM), an advanced factorization model optimized to handle large-scale implicit datasets effectively without relying on negative sampling. Experiments were conducted using two real-world datasets—Yelp2018 and Amazon Book Reviews 2023—after rigorous preprocessing steps, including data cleaning, re-indexing of user and item IDs, and
leave-one-out splitting for training and evaluation purposes.
The experimental results indicate that the ENSFM model demonstrates limited effectiveness when applied to multi-category datasets such as Yelp2018 and Amazon Book Reviews 2023. Specifically, the model struggles with accuracy issues, long computational times, high computational costs, demanding hardware requirements, and poor scalability. These limitations highlight the necessity of further optimization and adaptation of ENSFM for handling complex, multi-category real-world datasets.