I have a graph in neo4j, and for a given node N, I would like to find all nodes that are reachable in a path of no longer than P steps from N, as well as all links between that set of nodes. It seems like this could be possible with either Cypher or the Traversal framework; is one preferred over the other? I'm doing this from Java with an embedded database, and I will need to perform further queries on the subgraph. I've poked around and haven't found any conclusive answers.
I think cypher is the most concise way of getting your desired data, querying for variable length paths, some collecting and refining:
If n is the internal id of your node N
and your P
is 5:
START begin = node(n) // or e.g. index lookup
MATCH p = (begin)<-[r*..5]-(end) // match all paths of length up to 5
WITH distinct nodes(p) as nodes // collect the nodes contained in the paths
MATCH (x)<-[r]-(y) // find all relationships between nodes
WHERE x in nodes and y in nodes // which were found earlier
RETURN distinct x,r,y // and deduplicate as you find all pairs twice
It might not be the most efficient way, but at least the execution plan explanation in http://console.neo4j.org/ suggests that the y in nodes
is considered before the MATCH (x)-[r]-(y)
.
I couldn't think of a way to avoid matching the relationships twice, therefore the distinct
in the return statement.