Search code examples
neo4jcypherlist-comprehension

Cypher multiple OPTIONAL MATCH - Pattern Comprehension - COUNT DISTINCT


I have read a lot of comments about OPTIONAL MATCH and Pattern Comprehesion, but I can't find a solution for my case.

I have a node (Account) in my Neo4j Database and I'd like to count the nodes which belongs to each account. The following code works with one or two optional matches, but the many optional matches produce a cross product and a timeout.

// Account
MATCH (a:Account{billingCountry: "DE", isDeleted: false})
WHERE a.id IS NOT NULL

// User
MATCH (a)<-[:CREATED]-(u:User)

// Contact
OPTIONAL MATCH (a) <-[:CONTACT_OF]- (c:Contact{isDeleted: false})

// Opportunity
OPTIONAL MATCH (a) <-[:OPPORTUNITY_OF]- (o:Opportunity{isDeleted: false, s4sMarked_For_Deletion__C: false})

// Open Opportunity
OPTIONAL MATCH (a)<-[:OPPORTUNITY_OF]-(open:Opportunity{isClosed: false, isDeleted: false})

// Attribute
OPTIONAL MATCH (a) <-[:ATTRIBUTE_OF]- (aa:Attribute_Assignment{isDeleted: false})

// Sales Planning
OPTIONAL MATCH (a) <-[:SALESPLAN_OF]- (s:Sales_Planning)

// Task
OPTIONAL MATCH (a) <-[:TASK_OF]- (t:Task{isDeleted: false})

// Event
OPTIONAL MATCH (a) <-[:EVENT_OF]- (e:Event{isDeleted: false})

// Contract
OPTIONAL MATCH (a) <-[:CONTRACT_OF]- (ct:Contract{isDeleted: false})

RETURN
a.id

u.name AS User_Name,
u.department AS User_Department,

COUNT(DISTINCT c.id) AS Contact_Count,
COUNT(DISTINCT o.id) AS Opportunity_Count,
COUNT(DISTINCT open.id) AS OpenOpp_Count,
COUNT(DISTINCT aa.id) AS Attribute_Count,
COUNT(DISTINCT s.timeYear) AS Sales_Plan_Count,
COUNT(DISTINCT t.id) AS Task_Count,
COUNT(DISTINCT e.id) AS Event_Count,
COUNT(DISTINCT ct.id) AS Contract_Count

I can rewrite the query with a Pattern Compression, but then I just get back the non distinct ids in arrays. Is there a way to count the distinct values inside the arrays or another way how to count the values in pattern compression?

MATCH (a:Account{billingCountry: "DE", isDeleted: false})
WHERE a.id IS NOT NULL

RETURN a.id,
[
[(a)<-[:CONTACT_OF]- (c:Contact{isDeleted: false}) | c.id],
[(a)<-[:OPPORTUNITY_OF]- (o:Opportunity{isDeleted: false, s4sMarked_For_Deletion__C: false}) | o.id],
[(a)<-[:OPPORTUNITY_OF]-(open:Opportunity{isClosed: false, isDeleted: false}) | open.id],
[(a) <-[:ATTRIBUTE_OF]- (aa:Attribute_Assignment{isDeleted: false}) | aa.id],
[(a) <-[:SALESPLAN_OF]- (s:Sales_Planning) | s.timeYear],
[(a) <-[:TASK_OF]- (t:Task{isDeleted: false}) | t.id],
[(a) <-[:EVENT_OF]- (e:Event{isDeleted: false}) | e.id],
[(a) <-[:CONTRACT_OF]- (ct:Contract{isDeleted: false}) | ct.id]
]

If I made a formal mistake in my first stockoverflow post, I would appreciate feedback :)


Solution

  • The problem lies, in the RETURN statement, because you are calculating all the counts at the last, neo4j has to calculate the cartesian products. If you calculate each node count at each step, it will be much more optimal. Like this:

    MATCH (a:Account{billingCountry: "DE", isDeleted: false})
    WHERE a.id IS NOT NULL
    MATCH (a)<-[:CREATED]-(u:User)
    OPTIONAL MATCH (a) <-[:CONTACT_OF]- (c:Contact{isDeleted: false})
    WITH a, u, COUNT(DISTINCT c.id) AS Contact_Count,
    OPTIONAL MATCH (a) <-[:OPPORTUNITY_OF]- (o:Opportunity{isDeleted: false, s4sMarked_For_Deletion__C: false})
    WITH a, u, Contact_Count, COUNT(DISTINCT o.id) AS Opportunity_Count
    OPTIONAL MATCH (a)<-[:OPPORTUNITY_OF]-(open:Opportunity{isClosed: false, isDeleted: false})
    WITH a, u, Contact_Count, Opportunity_Count, COUNT(DISTINCT open.id) AS OpenOpp_Count
    OPTIONAL MATCH (a) <-[:ATTRIBUTE_OF]- (aa:Attribute_Assignment{isDeleted: false})
    WITH a, u, Contact_Count, Opportunity_Count, OpenOpp_Count, COUNT(DISTINCT aa.id) AS Attribute_Count
    OPTIONAL MATCH (a) <-[:SALESPLAN_OF]- (s:Sales_Planning)
    WITH a, u, Contact_Count, Opportunity_Count, OpenOpp_Count, Attribute_Count,COUNT(DISTINCT s.timeYear) AS Sales_Plan_Count
    OPTIONAL MATCH (a) <-[:TASK_OF]- (t:Task{isDeleted: false})
    WITH a, u, Contact_Count, Opportunity_Count, OpenOpp_Count, Attribute_Count, Sales_Plan_Count, COUNT(DISTINCT t.id) AS Task_Count
    OPTIONAL MATCH (a) <-[:EVENT_OF]- (e:Event{isDeleted: false})
    WITH a, u, Contact_Count, Opportunity_Count, OpenOpp_Count, Attribute_Count, Sales_Plan_Count, Task_Count, COUNT(DISTINCT e.id) AS Event_Count
    OPTIONAL MATCH (a) <-[:CONTRACT_OF]- (ct:Contract{isDeleted: false})
    RETURN
    a.id, u.name AS User_Name, u.department AS User_Department, Contact_Count,
    Opportunity_Count, OpenOpp_Count, Attribute_Count, Sales_Plan_Count,
    Task_Count, Event_Count, COUNT(DISTINCT ct.id) AS Contract_Count