Search code examples
neo4jcountcypherrelationshiprows

Neo4j - Counting rows with where condition


I'm trying to count the amount of rows that Neo4j will return but the count (or the query) is very slow.

Version 1 (70 sec):

MATCH (person:Person)-[:HAS_ORDER]->(order:Order)
WHERE order.timestamp >= 1632434400 AND size((order)<-[:HAS_ORDER]-(:OrderLine)-[:HAS_PRODUCT]->(:Product)) <= 20
WITH order
MATCH (order)<-[:HAS_ORDER]-(:OrderLine)-[:HAS_PRODUCT]->(product:Product)
RETURN COUNT(product);

Version 2 (68 sec.):

MATCH (person:Person)-[:HAS_ORDER]->(order:Order)
WITH size((order)<-[:HAS_ORDER]-(:OrderLine)-[:HAS_PRODUCT]->(:Product)) AS amount
WHERE order.timestamp >= 1632434400 AND amount <= 20
RETURN SUM(amount)

Using Neo4j 4.4 community with about 800000 orders and about 17000000 order lines.

Is there a more efficient way to count the rows?

These are the indexes:

CREATE INDEX idx_order_torder_id FOR (n:Order) ON (n.order_id);
CREATE INDEX idx_order_timestamp FOR (n:Order) ON (n.timestamp);
CREATE INDEX idx_person_person_id FOR (n:Person) ON (n.person_id);
CREATE INDEX idx_product_product_id FOR (n:Product) ON (n.product_id);

The amount of rows are equal to 4269011.

The EXPLAIN plan: enter image description here


Solution

  • Because every order line has one product, i can skip the counting of the relation order lines to products:

    MATCH (order:Order) 
    WHERE order.timestamp >= 1632434400 
    WITH order 
    MATCH (order)<-[:HAS_ORDER]-(orderLine:OrderLine) 
    WITH COUNT(orderLine) as productCount 
    WHERE productCount <= 20 
    RETURN SUM(productCount);
    

    This query took 0m17.342s

    But i managed to snoop some seconds with the following query:

    MATCH (order:Order) 
    WHERE order.timestamp >= 1632434400
    WITH order, size((order)<-[:HAS_ORDER]-(:OrderLine)) AS amount 
    WHERE amount <= 20 
    RETURN SUM(amount);
    

    This query took 0m15.675s