Search code examples
hadoopmahoutrecommendation-enginemahout-recommender

Mahout Datamodel with duplicate user,item enteries but different preference values


I was wondering how the distributed mahout recommender job org.apache.mahout.cf.taste.hadoop.item.RecommenderJob handled csv files where duplicate and triplicate user,item entries exist but with different preference values. For example, if I had a .csv file that had entries like

1,1,0.7
1,2,0.7
1,2,0.3
1,3,0.7
1,3,-0.7

How would Mahout's datamodel handle this? Would it sum up the preference values for a given user,item entry (e.g. for user item 1,2 the preference would be (0.7 + 0.3)), or does it average the values (e.g. for user item 1,2 the preference is (0.7 + 0.3)/2) or does it default to the last user,item entry it detects (e.g. for user 1,2 the preference value is set to 0.3).

I ask this question because I am considering recommendations based on multiple preference metrics (item views, likes, dislikes, saves to shopping cart, etc.). It would be helpful if the datamodel treated the preference values as linear weights (e.g. item views plus save to wish list has higher preference score than item views). If datamodel already handles this by summing, it would save me the chore of an additional map-reduce to sort and calculate total scores based on multiple metrics. Any clarification anyone could provide on mahout .csv datamodel works in this respect for org.apache.mahout.cf.taste.hadoop.item.RecommenderJob would be really appreciated. Thanks.


Solution

  • No, it overwrites. The model is not additive. However the model in Myrrix, a derivative of this code (that I'm commercializing) has a fundamentally additive data modet, just for the reason you give. The input values are weights and are always added.