Search code examples
normalizationmeanstandard-deviationrating-system

Removing bias in user ratings


I have got a dataset with users ratings on images. I am normalizing the ratings using mean- standard deviation normalization to remove bias in the dataset due to user specific preferences. Is this a correct way to handle bias or is there any other way to remove bias in users ratings.


Solution

  • This is certainly wrong on a couple of points:

    • If you 'normalise' input by standard deviation in this way, what you are saying is that "low variability doesn't matter much, only the outliers really count" -- because the outliers will have themselves a deviation larger than the standard one...
    • You are dealing with 'votes' of user satisfaction, not 'measurements'. Bias, by definition is information about satisfaction -- you are throwing it away. I.e. 150 years ago people used to find the "No dogs, no Irish" thing acceptable, these days not so much. If you want to predict how well a restaurant is likely to be regarded after a visit, you can't discount 0 star votes merely because the people objected to the sign!

    When it comes to star ratings as a prediction for how likely something is to be "enjoyed" or "regretted" you might want to read this article: https://www.evanmiller.org/how-not-to-sort-by-average-rating.html

    Note that the linked article is primarily interested in modelling "given past ratings, does the current vote indicate: (a) a continuation of past 'satisfaction', (b) a shifting trend towards increasing 'satisfaction', (c) a shifting trend towards decreasing 'satisfaction'" in terms of stars to award.