Search code examples
neo4jcypherneo4j-apoc

Return distinct paths


I want to get unique patterns from my graph, but if nodes are ordered differently in same paths, neo4j considers these paths to be different.

This is the pattern I want to find:

(a:Store)-[:SELLS]->(:Product)-[:SIMILAR]-(:Product)<-[:SELLS]-(b:Store)
                         |                     |
                    [:BUNDLED]            [:BUNDLED]
                         |                     |
(a:Store)-[:SELLS]->(:Product)-[:SIMILAR]-(:Product)<-[:SELLS]-(b:Store)

I tried this query:

match (a:Store)-[:SELLS]->(p1:Product)-[:BUNDLED]-(p2:Product)<-[:SELLS]-(a),
      (b:Store)-[:SELLS]->(p3:Product)-[:BUNDLED]-(p4:Product)<-[:SELLS]-(b),
      (p1)-[:SIMILAR]-(p3), (p2)-[:SIMILAR]-(p4)
return distinct apoc.coll.sortNodes(a + collect(distinct b),'name'), p1, p2, p3, p4

Which outputs 4 paths when I want only one:

[[JojaMarket, PierreStore], apple, orange, banana, kiwi]
[[JojaMarket, PierreStore], orange, apple, kiwi, banana]
[[JojaMarket, PierreStore], banana, kiwi, apple, orange]
[[JojaMarket, PierreStore], kiwi, banana, orange, apple]

How can I effectively ask neo4j to return unique patterns?


Solution

  • For these kind of issues of symmetric matches returning values in different orders, it helps to add some restrictions based on the id of the nodes, which should naturally rule out some of the paths found. This can also be a way to get a defined order between two nodes, so you can use it in place of sorting a and b.

    Try this:

    MATCH (a:Store)-[:SELLS]->(p1:Product)-[:BUNDLED]-(p2:Product)<-[:SELLS]-(a),
          (b:Store)-[:SELLS]->(p3:Product)-[:BUNDLED]-(p4:Product)<-[:SELLS]-(b),
          (p1)-[:SIMILAR]-(p3), (p2)-[:SIMILAR]-(p4)
          WHERE id(a) < id(b) AND id(p1) < id(p2) 
    RETURN DISTINCT [a, b], p1, p2, p3, p4