Search code examples
javamahoutrecommendation-enginecollaborative-filtering

Apache Mahout: combining different information as a rating


I'm new to Mahout and trying to write a UserBased recommender system. I read the book Mahout in Action but one question remained unanswered to me.

Does it make any sense to combine two or more pieces of information about a user-item relationship into a single rating value?

In fact, I've got the information

  1. If a user has downloaded an item or not (boolean). I could go with a boolean recommender.
  2. I also have user ratings (up/down) on the same elements, so I could go with them.

The problem is, that ratings are very sparse and not available in historic data.

That's why I was thinking to do something like this:

A rating is either +1.0 or -1.0 (thumb up or down), if no rating is present, I use 0.6 (or similar) as the rating if the user downloaded the item. Otherwise no relationship is added (=potential recommendation).

Is this any good? I may have even other things to chime in, like if someone has added an item to his favorites.

I would test it out, but the Evaluators use the rating value to determine how close a recommendation is, and this renders the test about what a good rating value is useless of course.


Solution

    1. One possible (naive solution) is to take the rating of the object the person has rated then divide it either by the average item rating across all users or if the data is even more sparse then divide the item rating by the average rating of the item category. In the absence of a rating you could just add the item/item type average score.

    2. You could then add this score to your boolean, with perhaps an added condition where you give a boolean value of 0.5 if the person has rated an item but not downloaded it. Perhaps he accessed the item from other sources.

    3. Check 3.1.1 of this paper They describe how they create a normalized user-item matrix before apply SVD. Will help.

    4. I am not sure how this method will work, might throw in some bias for items which are rated by very few people.