Search code examples
neo4jneo4j-apoc

Why does my apoc.refactor.cloneNodes call iterate and create clones for every node in graph?


I intended to clone a single node and its 3 connections, but ended up with multiple clones.

By first MATCHing the entire graph of primary node and related nodes, when I call apoc.refactor.cloneNodes, it seems to iterate over each related node instead of just the primary node I want to clone. Result is the original primary node and 3 clones (instead of the intended 1 clone) connected to the expected related nodes.

. . .

I created this toy graph:

create (a:Node {description:"Spider Man Series"})
create (b:Node {description:"Spidey"})
create (c:Node {description:"Doc Oc"})
create (d:Node {description:"Venom"})
create (a)-[:BELONGS]->(b)
create (a)-[:BELONGS]->(c)
create (a)-[:BELONGS]->(d)
return a,b,c,d

I want to clone "Spider Man Series" (and its relationships):

match (a)-[c]-(b)
where a.description="Spider Man Series"
call apoc.refactor.cloneNodes([a],true) yield output
return a,b,c, output

But this creates 3 clones (one for each related character node). I'm guessing it has to do something with the MATCH having a relationship.

Because if I just limit my MATCH with no relationships, I get the proper clone behavior (the original "Spider Man Series" and the clone "Spider Man Series" with cloned relationships). I'm confused because there's only 1 node that results from the WHERE clause which is stored in (a).

match (a)
where a.description="Spider Man Series"
call apoc.refactor.cloneNodes([a],true) yield output
return a,output

. . .

I tried limiting the related nodes to 2 instead of everything "Spider Man Series" was connected to, but this ALSO gave me a clone for each related node:

match (a)-[c]-(b)
where a.description="Spider Man Series" and b.description in ['Spidey','Venom']
call apoc.refactor.cloneNodes([a],true) yield output
return a,b,c, output

Solution

  • apoc.refactor.cloneNodes will take the nodes you give it and create copies of them, copying the relationships from the old nodes to the new nodes if you give it true as that second parameter.

    You're seeing duplication because, as you say, there are multiple rows coming back from that first query - one approach is to DISTINCT the a nodes before you do the clone:

    match (a)-[c]-(b)
    where a.description="Spider Man Series"
    WITH distinct a as da
    call apoc.refactor.cloneNodes([da],true) yield output
    return output
    

    enter image description here

    However, if you want to create a complete copy of the subgraph, i.e. have two 'Spider Man Series' nodes, and each has three character nodes but those two subgraphs aren't connected to each other then something like apoc.refactor.cloneSubgraphFromPaths will work better:

    match path=(a)-[c]-(b)
    where a.description="Spider Man Series"
    with collect(path) as paths
    call apoc.refactor.cloneSubgraphFromPaths(paths) YIELD output
    return output
    

    enter image description here