neo4j query-optimization cypher cartesian-product

Optimize cypher query to avoid cartesian product

The query purpose is pretty trivial. For a given nodeId(userId) I want to return on the graph all nodes which has relaionship within X hops and I want to aggregate and return the distance(param which set on the relationship) between them)

I came up with this:

MATCH p=shortestPath((user:FOLLOWERS{userId:{1}})-[r:follow]-(f:FOLLOWERS)) " +
                        "WHERE f <> user " +
                        "RETURN (f.userId) as userId," +                     
                        "reduce(s = '', rel IN r | s + rel.dist + ',') as dist," +
                        "length(r) as hop"

userId({1}) is given as Input and is indexed.

I believe Iam having here cartesian product. how would you suggest avoiding it?

Solution

You can make the cartesian product less onerous by creating an index on :FOLLOWERS(userId) to speed up one of the two "legs" of the cartesian product:

CREATE INDEX ON :FOLLOWERS(userId);

Even though this will not get rid of the cartesian product, it will run in O(N log N) time, which is much faster than O(N ^ 2).

By the way, your r relationship needs to be variable-length in order for your query to work. You should specify a reasonable upper bound (which depends on your DB) to assure that the query will finish in a reasonable time and not run out of memory. For example:

MATCH p=shortestPath((user:FOLLOWERS { userId: 1 })-[r:follow*..5]-(f:FOLLOWERS))
WHERE f <> user
RETURN (f.userId) AS userId,
  REDUCE (s = '', rel IN r | s + rel.dist + ',') AS dist,
  LENGTH(r) AS hop;