Dimension Independent Cosine Similarity for Collaborative Filtering using MapReduce

Published date
2016-02
Resource type
Publisher
ISBN
ISSN
DOI
Call no.
Other identifier(s)
Edition
Copyrighted date
Language
eng
File type
application/pdf
Extent
5 pages
Other title(s)
Advisor
Other Contributor(s)
Citation
Proceedings of the 8th International Conference on Knowledge and Smart Technology (KST 2016) – IEEE XPlore, pp. 72-76
Degree name
Degree level
Degree discipline
Degree department
Degree grantor
Abstract
DIMSUM, an efficient and accurate all-pair similarity algorithm for real-world large scale dataset, tackles shuffle size problem of several similarity measures using MapReduce. The algorithm uses a sampling technique to reduce `power items' and preserves similarities. This paper presents an improved algorithm DIMSUM+ with a complex sampling technique to enhance DIMSUM so that it is able to further reduce `power users'. The algorithm generates k-nearest-neighbor matrix that are used in collaborative based Recommender systems. The evaluations of algorithm on MovieLens dataset with 1 million movie ratings and Yahoo! Music dataset with 700 million song ratings show significant improvement that DIMSUM+ outperforms DIMSUM at least 1.4x faster.
Table of contents
Description
punsarn.dc.description.sponsorship
Spatial Coverage
Subject(s)
Rights
Access rights
Rights holder(s)
Location
View External Resources