I'm trying to reproduce a Go SGF game tree in Neo4j through Cypher so I can pattern-search through, as a graph. This is the shape of a Go game tree — based on Sabaki's SGF Parser —:
type GameTree = {
id: number;
data: { [key: string]: string[] };
parentId: number | null;
children: GameTree[];
};
You can find a reproducible sample of the problem in this commit.
So far, I've successfully managed to model it just like it appears in a Go editor's game tree:
B
andW
mean Black and White moves.AB
andAW
are "added black or white stones", for when editing the board position. For this question's purposes, onlymove
matters actually, which has the same value as eitherB
orW
. Board coordinates are given in 2 letters, e.g. columnm
and rowr
gives out the string'rm'
.
Now, what I would like to do is return the full path (probably the root node would suffice actually) from the root if a specified subpath (no skipping) is found. So far, I have this, which does work:
MATCH p=(g:GameNode)-[:NEXT_MOVE*]->()
WITH g, p,
[m in TAIL(NODES(p)) | m.move] AS moves
WITH g, p,
REDUCE(path = '', move in moves | path + move) AS joined_moves
WHERE joined_moves CONTAINS "rmro"
RETURN g, p, joined_moves
Following @cybersam's answer, I think this could make it more explicit to Neo4j so it uses an index on m.move
— e.g. CREATE INDEX move_node_move_idx FOR (m:MoveNode) ON (m.move)
—:
MATCH p=(g:GameNode)-[:NEXT_MOVE*]->(m1:MoveNode)-[:NEXT_MOVE*]->()
WHERE m1.move = HEAD(['rm', 'ro'])
WITH g, p,
[m in TAIL(NODES(p)) | m.move] AS moves
WITH g, p,
REDUCE(path = '', move in moves | path + move) AS joined_moves
WHERE joined_moves CONTAINS "rmro"
RETURN g, p, joined_moves
But I really don't know if this type of pattern-search would be efficient enough through Neo4j or other graph databases. I think it could be since, even though most games have 200+ move branches, there aren't that many of them usually. And the analysis would be on unidirectional branches where order matters (no skipping), so it should probably limit the search space even more.
Alternatively, I'm considering placing a string with the full path to it in each move node. This way I could use Regexes (I know Regexes can also be modeled as graphs) instead of path searches. That said, if it works, then I think using a SQL database would be probably enough. However, I would probably stick to Neo4j still, cause modeling this as a graph could provide more useful features in the future.
The query below should be fairly performant in returning the moves in every game that contains the desired sequence of moves. It assumes you:
MoveNode.move
,$moves
parameter,MATCH
to be 1 less than the length of $moves
. For example, if $moves
has 2 items, use *1
instead of *2
. This is admittedly ugly, but Cypher does not support dynamic bounds.MATCH p1=(m1:MoveNode)-[:NEXT_MOVE*2]->(m2)
WHERE m1.move = HEAD($moves) AND [m IN TAIL(NODES(p1)) | m.move] = TAIL($moves)
MATCH p2 = (m2)-[:NEXT_MOVE*0..]->(end) WHERE NOT (end)-[:NEXT_MOVE]->()
MATCH p3 = (g:GameNode)-[:NEXT_MOVE*]->(m1)
RETURN [a IN NODES(p3)[1..-1] | a.move] + $moves + [b IN NODES(p2)[1..] | b.move]
The index is used to limit and anchor the search by quickly finding the m1
nodes (the MoveNode
s matching the first item in $moves
), instead of wading through all paths from every GameNode
. Then the query:
m1
by requiring that it is followed by a subpath satisfying the rest of the $moves
list,m1
nodes to find the corresponding GameNode
s.So, this query starts off by using the index to find the first set of relevant nodes, and keeps adding relationships (and end nodes) to what was already found until it finally has the desired results. There is never a need to scan through irrelevant data.