Search code examples
neo4jcypherneo4j-apoc

How to mark all descendants of all source nodes?


In a directed graph you have source nodes, which have in-degree = 0. I want to do this procedure:

  1. Pick one source node
  2. For each of its descendants, mark that it is a descendant of the source node, and the number of hops to it. Maybe using breadth first search
  3. Repeat 1 until done

I can search for all source nodes, using:

CALL gds.graph.create( 'RELREV', 'note', {REL: {orientation: 'REVERSE'}});
CALL gds.degree.write('RELREV', { writeProperty: 'indegree' });
MATCH (n) WHERE n.indegree=0;

I can mark all the descendants of one source node (say node A), using:

match p=(start {name: 'nodename'})-[:REL*..8]->(n) 
with n, collect(length(p)) as collect 
unwind collect as hoplist 
with n, collect, min(hoplist) as min 
set n.A=min return n.name, collect, n.A;

But I can't find the way to mark this automatically for all the source nodes. Mostly because I don't know how to set a list property names, say property A for source node A, property B for source node B, and so on. I guess Parameters is the place to start, but reading that it doesn't seem helpful.


Solution

  • I would go along these lines ( not tested BTW), using apoc and assuming that you have a set of 'source' nodes.

    // get paths in an efficient way, so up to a 'leaf'
    MATCH p=(source)-[:REL*]->(n)
    WHERE NOT EXISTS((n)-[:REL]->())
    
    // create a set of rows for each path. number of rows equals length of path
    UNWIND RANGE(1,LENGTH(p)) AS i
    WITH source,i,nodes(p)[i] AS descendant 
    
    // use apoc magic  
    CALL apoc.create.setProperty(descendant,source.name,i) YIELD node
    
    // check the results
    RETURN source.name, properties(node)
    

    Additional explanation

    • the WHERE NOT EXISTS((n)-[:REL]->()) is exclude paths that you do not need to investigate . e.g. if you have (source)-[REL]->(d1)-[REL]->(d2) , you do not need to investigate (source)-[REL]->(d1)

    • if you have a graph like image below enter image description here

    This part of the query

    MATCH (source) WHERE source.name CONTAINS 'Ooker'
    MATCH p=(source)-[:REL*]->(n)
    WHERE NOT EXISTS((n)-[:REL]->())
    UNWIND RANGE(1,LENGTH(p)) AS i
    WITH source,i,nodes(p)[i] AS descendant 
    RETURN source.name, i, descendant.name
    

    returns this:

    ╒═════════════╤═══╤═════════════════╕
    │"source.name"│"i"│"descendant.name"│
    ╞═════════════╪═══╪═════════════════╡
    │"Ooker2"     │1  │"D"              │
    ├─────────────┼───┼─────────────────┤
    │"Ooker2"     │2  │"E"              │
    ├─────────────┼───┼─────────────────┤
    │"Ooker2"     │3  │"B"              │
    ├─────────────┼───┼─────────────────┤
    │"Ooker2"     │4  │"C"              │
    ├─────────────┼───┼─────────────────┤
    │"Ooker"      │1  │"A"              │
    ├─────────────┼───┼─────────────────┤
    │"Ooker"      │2  │"B"              │
    ├─────────────┼───┼─────────────────┤
    │"Ooker"      │3  │"C"              │
    └─────────────┴───┴─────────────────┘
    

    if you feed the source, i and descendant into the apoc call, it sets the right properties.