I'm starting in Neo4j and I'm doing the GraphAcademy intermediate cypher course.
The use case is to find the highest average rating by a user for a Tom Hanks Movie, so I wrote the query like so
match (p:Person{name: 'Tom Hanks'})-[:ACTED_IN]->(m:Movie)<-[r:RATED]-(u:User)
with avg(r.rating) as average, r.rating as rating, m.title as movie
return movie, average, rating order by average desc limit 1
which returns "Catch Me If You Can" 5.0 5.0 ---> wrong answer...
if I instead take the rating out of the query as
match (p:Person{name: 'Tom Hanks'})-[:ACTED_IN]->(m:Movie)<-[r:RATED]-(u:User)
with avg(r.rating) as average, m.title as movie
return movie, average order by average desc limit 1
it returns "Captain Phillips" 4.2 ---> correct answer
why is the returned value different if I include rating? results are ordered by average in both of them...but if I include rating highest average becomes 5.0 against the 4.2 when rating is omitted
I also tried to compute the average in the return clause but with the same wrong result
match (p:Person{name: 'Tom Hanks'})-[:ACTED_IN]->(m:Movie)<-[r:RATED]-(u:User)
with r.rating as rating, m.title as movie
return movie, avg(rating) as average, rating order by average desc limit 1
avg()
is a grouping function, and by including r.rating AS rating
in the WITH
statement, it will calculate the avg()
by rating
. So it groups all the cases where rating
is the same and calculates the average.
When you remove the LIMIT 1
in your first query, you will see that you will get a number of rows with the same value for rating
and average
.