    Dimension Independent Cosine Similarity for Collaborative Filtering using MapReduce 

    Shen, Fei; Rachsuda Jiamthapthaksin (2016-02)

    DIMSUM, an efficient and accurate all-pair similarity algorithm for real-world large scale dataset, tackles shuffle size problem of several similarity measures using MapReduce. The algorithm uses a sampling technique to reduce `power items' and preserves similarities. This paper presents an improved algorithm DIMSUM+ with a complex sampling technique to enhance DIMSUM so that it is able to further reduce `power users'. The algorithm generates k-nearest-neighbor matrix that are used in collaborative based Recommender systems. The evaluations of ...