How to fetch a path by a given breadcrumb string in Cypher (Neo4j)?

Initial situation

Rooted Tree, representing a directory like structure
- root node
- no cycles
- non binary
- directory names are not unique
- directories are modeled, but no „files“
- graph is connected
- graph is directed from root to leaves
Model
- (:Root)-[:CONTAINS]->(:Directory)-[:CONTAINS*]->(:Directory)
Neo4j 3.5.11
about 20 levels deep

CREATE
    (root:Root {name:'Root'}),
    (dirA:Directory {name:'dir A'}),
    (dirB:Directory {name:'dir B'}),
    (dirC:Directory {name:'dir C'}),
    (dirD:Directory {name:'dir D'}),
    (dirE:Directory {name:'dir E'}),
    (dirF:Directory {name:'dir F'}),
    (dirG:Directory {name:'dir G'}),
    (root)-[:CONTAINS]->(dirA),
    (root)-[:CONTAINS]->(dirB),
    (dirA)-[:CONTAINS]->(dirC),
    (dirA)-[:CONTAINS]->(dirD),
    (dirD)-[:CONTAINS]->(dirE),
    (dirD)-[:CONTAINS]->(dirF),
    (dirD)-[:CONTAINS]->(dirG);

level of freedom
- :Root label can be modeled also as (:Directory name:’Root’) if necessary
- the apoc library is highly welcomed

Given input parameter

row list of directly linked directory names
- varying amount of directories
- adjacent directories of the breadcrumb string are also directly linked in the tree / graph

example:

WITH 'dir A/dir D/dir G' as inputString
WITH split(inputString, '/') AS directories
UNWIND
    directories AS directory
RETURN
    directory;

╒═══════════╕
│"directory"│
╞═══════════╡
│"dir A"    │
├───────────┤
│"dir D"    │
├───────────┤
│"dir G"    │
└───────────┘

Challenge to be solved

For a specified breadcrumb string ("dir A/dir D/dir G") I need its representing path in Cypher, which will be part of a more complex query. I can not just search the tree for the last directory entry ("dir G") of the breadcrumb, because the directory names are not unique. How can my request be realized in Cypher?

Expected result:

╒═══════════════════════════════════════════════════════════════════════════════════════════════════════════════╕
│"path"                                                                                                         │
╞═══════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
│[{"name":"Root"},{},{"name":"dir A"},{"name":"dir A"},{},{"name":"dir D"},{"name":"dir D"},{},{"name":"dir G"}]│
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

Solution

For this case, I suggest having each :Directory node have the full path as a property, it will make matching to directories and their paths easier:

MATCH path = (:Root)-[:CONTAINS*]->(d:Directory)
WITH d, [node in tail(nodes(path)) | node.name] as directories
WITH d, apoc.text.join(directories, '/') as pathString
SET d.path = pathString

(you can use a similar query to update a directory (and its children) in cases where a directory is moved in the tree)

With this set, it makes it easy to match to the end node of a path, even if you aren't supplying the part of the path above the interested path (you didn't mention if the path you supply always extends from the root or if it's just the tail end of a path):

WITH 'dir A/dir D/dir G' as inputString
MATCH (end:Directory)
WHERE end.path ENDS WITH inputString
RETURN end

So if :DIRECTORY(path) is indexed, then you have fast access to the end node. Now to find the others.

We can use a variable-length path expression to find the full path of these nodes, using an all() predicate to ensure every node in the path has a name from the split input, and this is checked during expansion. This gets us paths of the nodes we want (wasting just a single additional traversal to a parent above), but it doesn't guarantee the order, we have to filter on that after.

This should work with your example graph:

WITH 'dir A/dir D/dir G' as inputString
WITH inputString, split(inputString, '/') as dirNames
MATCH (end:Directory)
WHERE end.path ENDS WITH inputString
MATCH path = (start)-[:CONTAINS*]->(end)
WHERE all(node in nodes(path) WHERE node.name IN dirNames)
WITH path
WHERE length(path) + 1 = size(dirNames) AND [node in nodes(path) | node.name] = dirNames
RETURN path