Search code examples
mahout

how to build an efficient ItemBasedRecommender in Mahout?


I am building an Item Based Recommender System for 10 millions users who rate categories over 20 possible categories (news categories like politic, sport etc...) I would like for each one of them to be recommended at least another category which they don't know (no rating).

I runned a GenericUserBasedRecommender and asked for recommendations for each user but It looks extremely long: maybe 1000 user proceeded per minute. My questions are:

1- Can I run this same GenericUserBasedRecommender on hadoop and would it really be faster? I saw and run an ItemBasedRecommender with command line on a cluster, but I would rather run a User Based one.

1,5 - I saw many users not having a single recommendations. What is the alogrithm criterium to determine if a user get a recommendation? I thought It could be that the user who don't get recommendations are the one who only give a single rating, but I don't understand why.

2- Is there another smarter way to deal with my problem? Maybe some clustering solution instead of recommendation? I don't exactly see how.

3- Finally, am I right when I say that the algorithms who have no command line are not to be used with hadoop?

Thank you for your answers.


Solution

    1. Sometimes you won't get recommendations for certain items or users because there are few items over which they overlap. It could also be a case where the user data may be 'enough', but his behaviour/use patterns are very unique and/or disagreement with popular trends in the data.

    2. You could perhaps try LogLikelihood or Tanimoto based ItemSimilarity.

    3. Another thing you could look into is a Matrix Factorization based model. You could use the ALSWR Factorizer to generate recommendations. this method decomposes the original User-Item matrix, to a User-Feature, Item-Feature and Diagonal matrix,--> then reduces the dimensionality-->and then recronstructs the matrix which is closest to the original matrix with same rank. You might lose some data this method, but the missing values in the user-item matrix are imputed and you get estimate preference/recommendation values.

    4. If you have the features and not just implicit ratings, you could probably experiment with clustering techniques, perhaps start with Hierarchical Clustering.

    5. I did not quite get your last question.