Search code examples
mahoutmahout-recommender

Apache Mahout Normalization Rating Dataset


It is possible to apply normalization rating data using mean-centering or z-score before I apply the rest of algorithm?

thank in advance


Solution

  • Are you trying to predict ratings or are your trying to recommend products for consumption?

    Very few uses of a recommender are really trying to predict ratings, they are usually trying to rank recommendations in the best way so they can show the top few. In which case using the log-likehood ratio works the best and it will ignore ratings since it calculates weights using a probabilistic method.

    If you have thumbs-down ratings mixed with thumbs-up ratings you need to decide what is unambiguously thumbs-up because you want to only recommend good things. For instance if you have a 1-5 star rating system it might be best to throw away all 1-3 ratings and only use 4-5. This seems counter-intuitive to some but really does produce better ranking. If you are looking at cross-validation offline tests make sure to use something like mean average precision--you want a precision measure since this measures ranking, don't use RMSE, which measures rating.

    If you are sure you want to predict ratings you can normalize each individuals ratings to fit in the same scale for all users but in the recommender don't use SIMILARITY_LOGLIKELIHOOD, use SIMILARITY_COSINE, which doesn't ignore preference weights. Then you can measure RMSE for cross-validation.