Search code examples
neo4jcyphertraversal

Create graph from subset of existing nodes


I've got a directed Neo4j graph containing 2 kinds of nodes: nodes with labels in Set 1 and nodes with labels in Set 2. I would like to create new edges (of a new type) between nodes in Set 1, whenever there is a directed path from a Set 1 node to another Set 1 node that passes only through Set 2 nodes (possibly 0 such Set 2 nodes).

Here's an example data set:

CREATE (a:A {id:"a"})-[:CONN]->(t1:T {id:"t1"}),
   (t1)-[:CONN]->(b1:B {id:"b1"}),
   (b1)-[:CONN]->(t2:U {id:"t1"}),
   (t2)-[:CONN]->(c1:C {id:"c1"}),
   (c1)-[:CONN]->(t3:T {id:"t3"}),
   (t3)-[:CONN]->(d1:D {id:"d1"}),
   (t3)-[:CONN]->(d2:D {id:"d2"}),
   (d1)-[:CONN]->(t4:T {id:"t4"}),
   (d2)-[:CONN]->(t4),
   (t4)-[:CONN]->(e1:E {id:"e1"}),
   (t4)-[:CONN]->(e2:E {id:"e2"})

In this example, A, B, C, D, & E are in Set 1, and T & U are in Set 2, so I want to draw new :AGG edges like so:

MATCH (a:A {id:"a"}), (b1:B {id:"b1"}), (c1:C {id:"c1"}), (d1:D {id:"d1"}),
      (d2:D {id:"d2"}), (e1:E {id:"e1"}), (e2:E {id:"e2"})
CREATE (a)-[:AGG]->(b1),
   (b1)-[:AGG]->(c1),
   (c1)-[:AGG]->(d1),
   (c1)-[:AGG]->(d2),
   (d1)-[:AGG]->(e1),
   (d1)-[:AGG]->(e2),
   (d2)-[:AGG]->(e1),
   (d2)-[:AGG]->(e2)

With respect to the CONN edges, I know the graph is a DAG, so I don't have to worry about loops.

Can this be done in Cypher? Or can someone suggest an efficient way through the Java interfaces (e.g. traversal strategy)? Thanks.


Solution

  • Yes, it can be done - this is a complex query so you might have to play with it a bit, but here's something to start off with. Perhaps someone else can come along and improve on this, but I think this should accomplish most of the basic logic.

    MATCH p=(node1)-[*]-(node2)
    WHERE ('A' in labels(node1) OR
           'B' in labels(node1) OR
           'C' in labels(node1) OR
           'D' in labels(node1) OR
           'E' in labels(node1)) 
          AND
          ('A' in labels(node2) OR
           'B' in labels(node2) OR
           'C' in labels(node2) OR
           'D' in labels(node2) OR
           'E' in labels(node2)) 
          AND
          (length(p) = 1 OR 
           all(intermedNode in 
               filter(n IN tail(nodes(p)) WHERE n <> last(nodes(p)))
               WHERE
               ('T' in labels(intermedNode) OR
                'U' in labels(intermedNode))))
    WITH node1, node2
    CREATE node1-[:MyNewNiftyEdge]->node2;
    

    Explanation:

    1. We're looking for paths; the first two big WHERE blocks are just proving that the source and target of the path both have to be in "Set1".
    2. The last part of the WHERE block does everything interesting. Paths of length 1 are OK (zero intermediate "Set2" nodes). But if there are intermediate nodes, what we do is check to see if all of the "internal" nodes are "Set2". The filter bit with the tail expression just chops off the first and last nodes from the path (we already know that's node1 and node2). Then the all expression insists that if there's anything in the middle, it must be a Set2 node (either labeled "T" or "U").
    3. Finally, with those two matched nodes, we create a nifty new relationship connecting them directly.