recursiongraphneo4jtreecypher# How to (Efficiently) Find Subpaths in Recursive Trees through Cypher (Graph vs Regex)

## References

#### Query

I'm trying to reproduce a Go SGF game tree in Neo4j through Cypher so I can pattern-search through, as a graph. This is the shape of a Go game tree — based on Sabaki's SGF Parser —:

```
type GameTree = {
id: number;
data: { [key: string]: string[] };
parentId: number | null;
children: GameTree[];
};
```

You can find a reproducible sample of the problem in this commit.

So far, I've successfully managed to model it just like it appears in a Go editor's game tree:

`B`

and`W`

mean Black and White moves.`AB`

and`AW`

are "added black or white stones", for when editing the board position. For this question's purposes, only`move`

matters actually, which has the same value as either`B`

or`W`

. Board coordinates are given in 2 letters, e.g. column`m`

and row`r`

gives out the string`'rm'`

.

Now, what I would like to do is return the full path (probably the root node would suffice actually) from the root if a specified subpath (no skipping) is found. So far, I have this, which does work:

```
MATCH p=(g:GameNode)-[:NEXT_MOVE*]->()
WITH g, p,
[m in TAIL(NODES(p)) | m.move] AS moves
WITH g, p,
REDUCE(path = '', move in moves | path + move) AS joined_moves
WHERE joined_moves CONTAINS "rmro"
RETURN g, p, joined_moves
```

Following @cybersam's answer, I think this could make it more explicit to Neo4j so it uses an index on `m.move`

— e.g. `CREATE INDEX move_node_move_idx FOR (m:MoveNode) ON (m.move)`

—:

```
MATCH p=(g:GameNode)-[:NEXT_MOVE*]->(m1:MoveNode)-[:NEXT_MOVE*]->()
WHERE m1.move = HEAD(['rm', 'ro'])
WITH g, p,
[m in TAIL(NODES(p)) | m.move] AS moves
WITH g, p,
REDUCE(path = '', move in moves | path + move) AS joined_moves
WHERE joined_moves CONTAINS "rmro"
RETURN g, p, joined_moves
```

But I really don't know if this type of pattern-search would be efficient enough through Neo4j or other graph databases. I think it could be since, even though most games have 200+ move branches, there aren't that many of them usually. And the analysis would be on unidirectional branches where order matters (no skipping), so it should probably limit the search space even more.

Alternatively, I'm considering placing a string with the full path to it in each move node. This way I could use Regexes (I know Regexes can also be modeled as graphs) instead of path searches. That said, if it works, then I think using a SQL database would be probably enough. However, I would probably stick to Neo4j still, cause modeling this as a graph could provide more useful features in the future.

Solution

The query below should be fairly performant in returning the moves in every game that contains the desired sequence of moves. It assumes you:

- Have an index on
`MoveNode.move`

, - Pass the list of desired moves in the
`$moves`

parameter, - Adjust the length of the variable-length relationship in the first
`MATCH`

to be 1 less than the length of`$moves`

. For example, if`$moves`

has 2 items, use`*1`

instead of`*2`

.*This is admittedly ugly, but Cypher does not support dynamic bounds.*

```
MATCH p1=(m1:MoveNode)-[:NEXT_MOVE*2]->(m2)
WHERE m1.move = HEAD($moves) AND [m IN TAIL(NODES(p1)) | m.move] = TAIL($moves)
MATCH p2 = (m2)-[:NEXT_MOVE*0..]->(end) WHERE NOT (end)-[:NEXT_MOVE]->()
MATCH p3 = (g:GameNode)-[:NEXT_MOVE*]->(m1)
RETURN [a IN NODES(p3)[1..-1] | a.move] + $moves + [b IN NODES(p2)[1..] | b.move]
```

The index is used to limit and anchor the search by quickly finding the `m1`

nodes (the `MoveNode`

s matching the first item in `$moves`

), instead of wading through all paths from every `GameNode`

. Then the query:

- Filters
`m1`

by requiring that it is followed by a subpath satisfying the rest of the`$moves`

list, - Finds the sequence of moves that follow each subpath,
- Works backwards from the surviving
`m1`

nodes to find the corresponding`GameNode`

s. - Constructs and returns the sequence of moves for each matching game.

So, this query starts off by using the index to find the first set of relevant nodes, and keeps adding relationships (and end nodes) to what was already found until it finally has the desired results. There is never a need to scan through irrelevant data.

- How to loop over a multidimensional objectArray with Smarty?
- TypeScript recursive union function type
- Time complexity for recursive binary search that also prints current subarray
- How can I fix an error the error `Cannot build rewrite system for generic signature; rule length limit exceeded` in swift?
- Find the parent element of an array by the id of its child
- recursion on nested list need some support
- How do you prove termination of a recursive list length?
- python recursion i dont understand this output pls help me
- Why does recursively running `joblib.Parallel` increases the computation time?
- I'm trying to understand recursion in Tcl, but every time the recursion finishes it throws errors
- How to remove all numbers from a list in Lisp
- Calculate the total weight of segments
- Why does the expression !(cin>>word) cause infinite recursion?
- What's the best way to recursively traverse a BinaryTree in Java without void methods?
- can anybody explains the code and also recursion tree
- Is it possible to limit the depth of a recursive directory listing in S3 bucket?
- takeWhile implementation in JavaScript - Looking for better ideas
- I used a recursive function to modify a string, but if the string is too large the function returns nothing
- How to Compress Paths in SQL Server Using Recursive CTE with Node Visibility Conditions?
- Why this recursion example in ocaml doesn't work for negative number?
- Recursive loop on a list to print xml with indent
- Big O of algorithm that steps over array recursively
- Powerset of a list with equal elements in Java
- Recursion in FP-Growth Algorithm
- Passing list to function results in no case clause matching
- Convert tree-like structure formatted as a string using indentations to a list of paths
- Recursive function fails depending on lexical scoping
- Creating a list of interleaved elements: Prolog
- How to get the recursive difference formula between two columns in excel
- Get all values and associative keys as a flat array from a multidimensional array of variable depth/structure