Analysis and Evaluation of Similarity Metrics in Collaborative Filtering Recommender System
Guo, Shuhang (2014)
Guo, Shuhang
Lapin ammattikorkeakoulu
2014
All rights reserved
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:amk-2014052610051
https://urn.fi/URN:NBN:fi:amk-2014052610051
Tiivistelmä
This research is focused on the field of recommender systems. The general aims of
this thesis are to summary the state-of-the-art in recommendation systems, evaluate
the efficiency of the traditional similarity metrics with varies of data sets, and
propose an ideology to model new similarity metrics.
The literatures on recommender systems were studied for summarizing the current
development in this filed. The implementation of the recommendation and evaluation
was achieved by Apache Mahout which provides an open source platform of
recommender engine. By importing data information into the project, a customized
recommender engine was built. Since the recommending results of collaborative
filtering recommender significantly rely on the choice of similarity metrics and the
types of the data, several traditional similarity metrics provided in Apache Mahout
were examined by the evaluator offered in the project with five data sets collected by
some academy groups.
From the evaluation, I found out that the best performance of each similarity metric
was achieved by optimizing the adjustable parameters. The features of each
similarity metric were obtained and analyzed with practical data sets. In addition, an
ideology by combining two traditional metrics was proposed in the thesis and it was
proven applicable and efficient by the metrics combination of Pearson correlation
and Euclidean distance.
The observation and evaluation of traditional similarity metrics with practical data is
helpful to understand their features and suitability, from which new models can be
created. Besides, the ideology proposed for modeling new similarity metrics can be
found useful both theoretically and practically.
this thesis are to summary the state-of-the-art in recommendation systems, evaluate
the efficiency of the traditional similarity metrics with varies of data sets, and
propose an ideology to model new similarity metrics.
The literatures on recommender systems were studied for summarizing the current
development in this filed. The implementation of the recommendation and evaluation
was achieved by Apache Mahout which provides an open source platform of
recommender engine. By importing data information into the project, a customized
recommender engine was built. Since the recommending results of collaborative
filtering recommender significantly rely on the choice of similarity metrics and the
types of the data, several traditional similarity metrics provided in Apache Mahout
were examined by the evaluator offered in the project with five data sets collected by
some academy groups.
From the evaluation, I found out that the best performance of each similarity metric
was achieved by optimizing the adjustable parameters. The features of each
similarity metric were obtained and analyzed with practical data sets. In addition, an
ideology by combining two traditional metrics was proposed in the thesis and it was
proven applicable and efficient by the metrics combination of Pearson correlation
and Euclidean distance.
The observation and evaluation of traditional similarity metrics with practical data is
helpful to understand their features and suitability, from which new models can be
created. Besides, the ideology proposed for modeling new similarity metrics can be
found useful both theoretically and practically.