Search code examples
algorithmlinear-algebrarecommendation-enginepredict

Recommendation system and baseline predictors


I have a bunch of data where the first column represents users, the second column is movies, and the third is a ten-points rating.

0 0 9
0 1 8
1 1 4
1 2 6
2 2 7

And I have to predict the third number for another ser of data (user, movie, ?):

0 2
1 0
2 0
2 1

I use this way for finding bias values https://youtube.com/watch?v=dGM4bNQcVKI and this way for predicting https://www.youtube.com/watch?v=4RSigTais8o.

Bias value for user number 0: 9 + 8 / 2 = 8.5 - 1.5 = 7.

enter image description here

Bias value for movie number 2: 6 + 7 / 2 = 6.5 - 1.5 = 5.

enter image description here

And baseline predictors:

https://intern.contest.yandex.ru/testsys/tex/render/XGhhdCByX3t1bX0gPSBcbXUgKyBiX3UgKyBiX20s.png

1.5 + 7 + 5, where result is 13.5, but in contest result is: 7.052009.

But the problem description says the result of my Recommendation system should be:

0 2 7.052009
1 0 6.687943
2 0 6.995272
2 1 6.687943

Where is my mistake?


Solution

  • The raw average is the average of ALL the present scores ((9+8+4+6+7) / 5 = 6.8), I don't see that number anywhere, so I guess that's your error.

    In the video Prof. used the raw average of 3.5 on all the calculations, including calculating bias, he skipped how to reach that number, if you add all numbers on the table of the video and divide, you get 3.5.

    0 2 9.2 is the answer for the first one, using your videos as guide. The videos claims to have avoided calculus, the different final answers of the contest probably come from using the "full" method.

    0 2 ?, user 0 (row 0: 9 8 x), movie 2 (column 2: x 6 7)

    raw average = 6.8
    bias user 0: (9+8) / 2 - 6.8 = 1.7
    bias movie 2: (6+7) / 2 - 6.8 = -0.3
    prediction: 6.8+1.7-0.3 = 8.2
    

    The problem looks like a variation of the Netflix Contest, the contest' host knows the actual answers (the ratings), he doesn't give them to you, you are expected to guess/predict them, the winner of the contest is the one that gets the closest to the actual answers.

    The winner of you contest got the closest, but he got there using an unknown method, or his own variation of a know method, if your goal is to match his answer exactly, you are better off asking him what method he used and how did he modify it, and try to replicate his results.

    If this was homework and not a contest, then the teacher would expect you to use the "correct" method he taught you (there's no set method, just many methods that work with different accuracy), you'd have to use it exactly like he taught you. But it is a contest, your goal is to find a base method that approximates the best (the one you used is very low on accuracy), and tinker with it a bit to get even better results.

    If you want to understand the link I suggest you research and later ask a statistics question, because it's just plain statistics. You can try to understand the link or research Matrix factorization on your own. Remember that to get contest winning results (or close) you won't be able to use a simple method like the one you found on the youtube video, but require a method with a lot more math.