I am using neo4j to setup a recommender system. I have the following setup:
Nodes:
Relationships
(m:Movie)-[w:WEIGHT {weight: 10}]->(a:Attribute)
(u:User)-[r:RATED {rating: 5}]->(m:Movie)
Here is a diagram of how it looks:
I am now trying to figure out how to apply a collaborative filtering scheme that works as follows:
user
has liked (implicitly by liking the movies) other users
that have liked these similar attributesuser
, which the user has NOT seen, but similar other users
have seen.The condition is obviously that each attribute has a certain weight for each movie. E.g. the genre adventure
can have a weight of 10
for the Lord of Rings but a weight of 5
for the Titanic.
In addition, the system needs to take into account the ratings for each movies. E.g. if other user
has rated Lord of the Rings 5
, then his/her attributes of the Lord of Ranges are scaled by 5
and not 10
. The user
that has rated the implicit attributes also close to 5
should then get this movie recommended as opposed to another user that has rated similar attributes higher.
I made a start by simply recommending only other movies that other users have rated, but I am not sure how to take into account the relationships RATING and WEIGHT. It also did not work:
MATCH (user:User)-[:RATED]->(movie1)<-[:RATED]-(ouser:User),
(ouser)-[:RATED]->(movie2)<-[:RATED]-(oouser:User)
WHERE user.uid = "user4"
AND NOT (user)-[:RATED]->(movie2)
RETURN oouser
What you are looking for, mathematically speaking, is a simplified Jaccard index between two users. That is, how similar are they based on how many things they have in common. I say simplified because we are not taking into account the movies they disagree about. Essentially, and following your order, it would be:
1) Get the total weight of every Attribute for every user. For instance:
MATCH (user:User{name:'user1'})
OPTIONAL MATCH (user)-[r:RATED]->(m:Movie)->[w:WEIGHT]->(a:Attribute)
WITH user, r.rating * w.weight AS totalWeight, a
WITH user, a, sum(totalWeight) AS totalWeight
We need the last line because we had a row for each Movie-Attribute combination
2) Then, we get users with similar tastes. This is a performance danger zone, some filtering might be neccesary. But brute forcing it, we get users that like each attribute within an 10% error (for instance)
WITH user, a, totalWeight*0.9 AS minimum, totalWeight*1.10 AS maximum
MATCH (a)<-[w:WEIGHT]-(m:Movie)<-[r:RATES]-(otherUser:User)
WITH user, a, otherUser
WHERE w.weight * r.rating > minimum AND w.weight * r.rating < maximum
WITH user, otherUser
So now we have a row (unique because of last line) with any otherUser that is a match. Here, to be honest, I would need to try to be sure if otherUsers with only 1 genre match would be included.. if they are, an additional filter would be needed. But I think that should go after we get this going.
3) Now it´s easy:
MATCH (otherUser)-[r:RATES]->(m:Movie)
WHERE NOT (user)-[:RATES]->(m)
RETURN m, sum(r.rating) AS totalRating ORDER BY totalRating DESC
As mentioned before, the tricky part is 2), but after we know how to get the math going, it should be easier. Oh, and about math, for it to work properly, total weights for a movie should sum 1 (normalizing). In any other case, the difference between total weights for movies would cause an unfair comparison.
I wrote this without proper studying (paper, pencil, equations, statistics) and trying the code in a sample dataset. I hope it can help you anyway!
In case you want this recommendation without taking into account user ratings or attribute weights, it should be enough to substitute the math in lines in 1) and 2) with just r.rating or w.weight, respectively. RATES and WEIGHTS relationships would still be used, so for instance an avid consumer of Adventure movies would be recommended Movies by consumers of Adventure movies, but not modified by ratings or by attribute weight, as we chose.
EDIT: Code edited to fix syntax errors discussed in comments.