Search code examples
neo4jspring-data-neo4j

Neo4J counts filtered relationship twice


I have the following nodes:

  • Guid

  • Operation

  • Customer

I have the following relationships

  • Guid -[PERFORMS]-> Operation

  • Operation -[OPERATION_FOR]-> Customer

On the OPERATION_FOR relationship I have an attribute guid which holds the same values as the id property of the Guid node.

When I perform the following query

MATCH (guid:Guid{id:"005056AF0E7F1EECB0E4FCC33F085227"}) -[perform:PERFORMS]->(operation:Operation)
MATCH (operation) -[operationFor:OPERATION_FOR{guid:guid.id}]-> (customer:Customer)
RETURN operationFor,operation,customer

I expect 5 results. The Graph interface indeed shows 5 relationships but it counted 11 relationships enter image description here

When I look at the text view, it shows 11 entries but it has duplicates. I've found other questions about this but the queries from those questions didn't specify the direction of the relationship and therefore it got counted twice. I did specify the direction however so I'm not sure why this is.

operationFor operation customer
[:OPERATION_FOR {guid: "005056AF0E7F1EECB0E4FCC33F085227",startTime: "2022-04-23T18:50:30",endTime: "2022-04-23T19:10:36"}] (:Operation {id: "SHIP"}) (:Customer {id: "BP0103"})
[:OPERATION_FOR {guid: "005056AF0E7F1EECB0E4FCC33F085227",startTime: "2022-04-23T18:50:30",endTime: "2022-04-23T19:10:36"}] (:Operation {id: "SHIP"}) (:Customer {id: "BP0103"})
[:OPERATION_FOR {guid: "005056AF0E7F1EECB0E4FCC33F085227",startTime: "2022-04-23T17:50:30",endTime: "2022-04-23T17:50:36"}] (:Operation {id: "SHIP"}) (:Customer {id: "BP0103"})
[:OPERATION_FOR {guid: "005056AF0E7F1EECB0E4FCC33F085227",startTime: "2022-04-23T18:50:30",endTime: "2022-04-23T19:10:36"}] (:Operation {id: "STORE"}) (:Customer {id: "Flexo"})
[:OPERATION_FOR {guid: "005056AF0E7F1EECB0E4FCC33F085227",startTime: "2022-04-23T18:50:30",endTime: "2022-04-23T19:10:36"}] (:Operation {id: "LOAD"}) (:Customer {id: "Elision"})
[:OPERATION_FOR {guid: "005056AF0E7F1EECB0E4FCC33F085227",startTime: "2022-04-23T18:50:30",endTime: "2022-04-23T19:10:36"}] (:Operation {id: "SHIP"}) (:Customer {id: "BP0103"})
[:OPERATION_FOR {guid: "005056AF0E7F1EECB0E4FCC33F085227",startTime: "2022-04-23T18:50:30",endTime: "2022-04-23T19:10:36"}] (:Operation {id: "SHIP"}) (:Customer {id: "BP0103"})
[:OPERATION_FOR {guid: "005056AF0E7F1EECB0E4FCC33F085227",startTime: "2022-04-23T17:50:30",endTime: "2022-04-23T17:50:36"}] (:Operation {id: "SHIP"}) (:Customer {id: "BP0103"})
[:OPERATION_FOR {guid: "005056AF0E7F1EECB0E4FCC33F085227",startTime: "2022-04-23T18:50:30",endTime: "2022-04-23T19:10:36"}] (:Operation {id: "SHIP"}) (:Customer {id: "BP0103"})
[:OPERATION_FOR {guid: "005056AF0E7F1EECB0E4FCC33F085227",startTime: "2022-04-23T18:50:30",endTime: "2022-04-23T19:10:36"}] (:Operation {id: "SHIP"}) (:Customer {id: "BP0103"})
[:OPERATION_FOR {guid: "005056AF0E7F1EECB0E4FCC33F085227",startTime: "2022-04-23T17:50:30",endTime: "2022-04-23T17:50:36"}] (:Operation {id: "SHIP"}) (:Customer {id: "BP0103"})

When I do the RETURN DISTINCT(operationFor),operation,customer or I perform the following query instead

MATCH (operation:Operation) -[operationFor:OPERATION_FOR{guid:"005056AF0E7F1EECB0E4FCC33F085227"}]-> (customer:Customer)
RETURN operationFor,operation,customer

I do get 5 unique entries in the table.

operationFor operation customer
[:OPERATION_FOR {guid: "005056AF0E7F1EECB0E4FCC33F085227",startTime: "2022-04-23T18:50:30",endTime: "2022-04-23T19:10:36"}] (:Operation {id: "SHIP"}) (:Customer {id: "BP0103"})
[:OPERATION_FOR {guid: "005056AF0E7F1EECB0E4FCC33F085227",startTime: "2022-04-23T18:50:30",endTime: "2022-04-23T19:10:36"}] (:Operation {id: "SHIP"}) (:Customer {id: "BP0103"})
[:OPERATION_FOR {guid: "005056AF0E7F1EECB0E4FCC33F085227",startTime: "2022-04-23T17:50:30",endTime: "2022-04-23T17:50:36"}] (:Operation {id: "SHIP"}) (:Customer {id: "BP0103"})
[:OPERATION_FOR {guid: "005056AF0E7F1EECB0E4FCC33F085227",startTime: "2022-04-23T18:50:30",endTime: "2022-04-23T19:10:36"}] (:Operation {id: "STORE"}) (:Customer {id: "Flexo"})
[:OPERATION_FOR {guid: "005056AF0E7F1EECB0E4FCC33F085227",startTime: "2022-04-23T18:50:30",endTime: "2022-04-23T19:10:36"}] (:Operation {id: "LOAD"}) (:Customer {id: "Elision"})

Solution

  • It's because you do two matches after each other. The first MATCH will return 5 rows (one for each operation performed by the Guid).

    The second match will be done once for each of the rows from the first match.

    The two operations that were only done once will just get one result from the second match as well. But the operation that was done three times yielded three rows from the first match, and each of them will get three hits in the second match.

    Therefore you get 3*3+2 = 11 rows in the final result.

    The solution is, as you suggest yourself, to use DISTINCT.