Search code examples
neo4jcypherquery-optimization

Neo4j Cypher calculate count with collect/size/unwind and performance/scalability of the query


There is a trick how to calculate a total count of elements within the same query which returns the paginated data, for example:

.. some calculations
WITH childD  
WITH collect({`childD`:childD }) as aggregate
WITH aggregate, size(aggregate) as count 
UNWIND aggregate as item 
WITH count, item.childD as childD 
... proceed

I am now deciding whether to take this approach or not. I am concerned about the scalability and performance of such a query. Will I have problems using this approach, and should I avoid it, or is this the normal way to go? Currently, I can only test this for 20k nodes, but what about 100k or more. Could you please answer this question from a Neo4j Cypher theoretical perspective? Thanks!

UPDATED

This is why I use a Map:

WITH childD, weight, totalVotes  
WITH collect({`childD`:childD ,`weight`:weight, `totalVotes`: totalVotes }) as aggregate 
WITH aggregate, size(aggregate) as count 
UNWIND aggregate as item WITH count, item.childD as childD , item.weight as weight, item.totalVotes as totalVotes

Solution

  • Yes, that is the correct way to use collect and unwind to get a group count in Cypher. However, I don't understand why you are creating the map in the collect statement. I would do it this way.

    .. some calculations
    WITH childD  
    WITH collect(childD) as aggregate
    WITH aggregate, size(aggregate) as count 
    UNWIND aggregate as childD 
    WITH count, childD 
    ... proceed
    

    This blog post might help. https://medium.com/neo4j/kickstart-your-transition-from-sql-analytic-and-window-functions-to-neo4j-987d67f7fdb4