Search code examples
neo4jcypher

Different query results depending on the variables returned Node4j


I'm starting in Neo4j and I'm doing the GraphAcademy intermediate cypher course.

The use case is to find the highest average rating by a user for a Tom Hanks Movie, so I wrote the query like so

match (p:Person{name: 'Tom Hanks'})-[:ACTED_IN]->(m:Movie)<-[r:RATED]-(u:User)
with avg(r.rating) as average, r.rating as rating, m.title as movie
return movie, average, rating order by average desc limit 1

which returns "Catch Me If You Can" 5.0 5.0 ---> wrong answer...

if I instead take the rating out of the query as

match (p:Person{name: 'Tom Hanks'})-[:ACTED_IN]->(m:Movie)<-[r:RATED]-(u:User)
with avg(r.rating) as average,  m.title as movie
return movie, average order by average desc limit 1

it returns "Captain Phillips" 4.2 ---> correct answer

why is the returned value different if I include rating? results are ordered by average in both of them...but if I include rating highest average becomes 5.0 against the 4.2 when rating is omitted

I also tried to compute the average in the return clause but with the same wrong result

match (p:Person{name: 'Tom Hanks'})-[:ACTED_IN]->(m:Movie)<-[r:RATED]-(u:User)

with r.rating as rating, m.title as movie
return movie, avg(rating) as average, rating order by average desc limit 1

Solution

  • avg() is a grouping function, and by including r.rating AS rating in the WITH statement, it will calculate the avg() by rating. So it groups all the cases where rating is the same and calculates the average.

    When you remove the LIMIT 1 in your first query, you will see that you will get a number of rows with the same value for rating and average.