Search code examples
neo4jcypher

What is the real benefit of Cypher's WITH clause?


I am following Neo4j's 'Intermediate Cypher Queries' course in the neo4j graph academy, and I've been introduced to the WITH clause, whose basic function is to define or re-define the scope of variables. Now for some reason I can't quite wrap my head around the use of the WITH clause with the examples they have given, especially in regard to pipelining. For example, in one of the exercises I am told to use WITH to aggregate intermediate results. Here is the correct answer:

MATCH (p:Person)-[:ACTED_IN]->(m:Movie)<-[r:RATED]-(:User)
WHERE p.name = 'Tom Hanks'
WITH m, avg(r.rating) AS avgRating
RETURN m.title AS Movie, avgRating AS `AverageRating`
ORDER BY avgRating DESC

But, to my mind, the WITH clause doesn't really do much work. To convince myself, I re-wrote the query to get the same result without the WITH clause:

MATCH (p:Person)-[:ACTED_IN]->(m:Movie)<-[r:RATED]-(:User)
WHERE p.name = 'Tom Hanks'
RETURN m.title AS Movie, avg(r.rating) AS `AverageRating`
ORDER BY avg(r.rating) DESC

This works fine, with one less line of code. Perhaps the issue is just of example – in much longer queries the 'WITH method' would come into its own. But, as it stands, I can't fully account for the real use of WITH. So, for example, they talk about pipelining results, but we specified 'm' right at the start in the MATCH clause, so why are we bothering to have a WITH clause with it the 'm' variable in it again? As for the 'avg(r.rating)', really it just seems like we're wasting time renaming the result of a query when this is something we can just do as the end as I have done. So, what's really going on here? Can someone enlighten me?


Solution

  • The WITH clause is helpful when you want to do intermediate aggregations or do several aggregations in sequence. You could also do intermediate filtering. Think of it as an option to manipulate/transform data in the middle of a query statement.

    MATCH (p:Person)-[:ACTED_IN]->(m:Movie)<-[r:RATED]-(:User)
    WITH m, avg(r.rating) AS avgRating
    WHERE avgRating > 8
    RETURN m.title AS Movie, avgRating AS `AverageRating`
    ORDER BY avgRating DESC
    

    Here is one example where you perform intermediate aggregation combined with filtering, that otherwise wouldn't be possible without a WITH statement as the average rating has to be calculated, and you can't filter results in the RETURN statement