Search code examples
mathrecommendation-enginecollaborative-filtering

Weighted mean tending towards center


I'm experimenting on some movie rating data. Currently doing some hybrid item and user based predictions. Mathimatically I'm unsure how to implement what I want and maybe the answer is just straight forward weighed mean but I feel like there might be some other option.

I have 4 values for now, that I want to get the mean of

  1. item based prediction
  2. user based prediction
  3. Global movie average for given item
  4. Global user average for given user

As this progesses there will be other values I'll need to add to the mix such as weighted similarity, genre weighting and I'm sure a few other things.

For now I want to focus on the data available to me as stated above as much for understanding as anything else.

Here is my theory. To start I want to weight the item and user based prediction equally which will have more weight than the global averages.

I feel though on my very rusty maths and some basic attempts to come up with a less linear solution is to use something like Harmonic mean. but instead of natuarlly tending towards the low mean value tend towards the global average.

e.g

predicted item base rating 4.5

predicted user based rating 2.5

global movie rating 3.8

global user rating 3.6

so the "centre"/global average here would be 3.7

I may be way off base with this as my maths is quite rusty but anyone any thoughts on how I could mathematically represent what I'm thinking?

OR

do you have any thoughts on a different approach


Solution

  • I recommend you to look into "Recommender systems handbook" by F. Ricci et al., 2011. It summarizes all the common approaches in recommender engines and provides all the necessary formulas.
    Here is an excerpt from 4.2.3:

    As the number of neighbors used in the prediction increases, the rating predicted by the regression approach will tend toward the mean rating of item i. Suppose item i has only ratings at either end of the rating range, i.e. it is either loved or hated, then the regression approach will make the safe decision that the item’s worth is average. [...] On the other hand, the classification approach will predict the rating as the most frequent one given to i. This is more risky as the item will be labeled as either “good” or “bad”.