The query purpose is pretty trivial. For a given nodeId(userId) I want to return on the graph all nodes which has relaionship within X hops and I want to aggregate and return the distance(param which set on the relationship) between them)
I came up with this:
MATCH p=shortestPath((user:FOLLOWERS{userId:{1}})-[r:follow]-(f:FOLLOWERS)) " +
"WHERE f <> user " +
"RETURN (f.userId) as userId," +
"reduce(s = '', rel IN r | s + rel.dist + ',') as dist," +
"length(r) as hop"
userId({1}) is given as Input and is indexed.
I believe Iam having here cartesian product. how would you suggest avoiding it?
You can make the cartesian product less onerous by creating an index on :FOLLOWERS(userId)
to speed up one of the two "legs" of the cartesian product:
CREATE INDEX ON :FOLLOWERS(userId);
Even though this will not get rid of the cartesian product, it will run in O(N log N) time, which is much faster than O(N ^ 2).
By the way, your r
relationship needs to be variable-length in order for your query to work. You should specify a reasonable upper bound (which depends on your DB) to assure that the query will finish in a reasonable time and not run out of memory. For example:
MATCH p=shortestPath((user:FOLLOWERS { userId: 1 })-[r:follow*..5]-(f:FOLLOWERS))
WHERE f <> user
RETURN (f.userId) AS userId,
REDUCE (s = '', rel IN r | s + rel.dist + ',') AS dist,
LENGTH(r) AS hop;