I'm trying to build a basic collaborative filtering recommendation system using MySQL. I have a user rating table like this:
user_id movie_id rating
1 131 342 3 <<< User 131 has rated movie 342
2 131 312 5 <<< and also 312
3 122 312 4
4 213 342 5
5 141 342 5 <<< User 141 has rated 342
6 141 312 4 <<< and also 312 (2 movies in common)
7 141 323 3
So I'm trying to find similar users to 131. I want to return the users who have at least two movies in common and the ratings are above 3. So it should return rows 5 and 6 (as shown above).
This is what I have so far:
SELECT * from user_ratings
WHERE rating >= 3
AND movie_id IN (SELECT movie_id from user_rating WHERE user_id = 131)
AND user_id != 131
This returns:
user_id movie_id rating
3 122 312 4 <<< Don't want these two
4 213 342 5 <<<
5 141 342 5
6 141 312 4
It returns the movies that users have in common with 131, but I need it to only show the users who have at least two items in common. How could I do this? I'm unsure of how to proceed :(
You can first find the user_id
s that have more or equal number of movies as user_id = 131
with rating > 3
. Then use IN
in the WHERE
clause to get the additional data:
SELECT *
FROM user_ratings
WHERE
user_id IN(
SELECT user_id
FROM user_ratings
WHERE
movie_id IN (SELECT movie_id FROM user_ratings WHERE user_id = 131)
AND rating > 3
GROUP BY user_id
HAVING
COUNT(*) >= (SELECT COUNT(*) FROM user_ratings WHERE user_id = 131)
)
AND rating > 3