Search code examples
neo4jquery-optimizationcyphercartesian-product

Optimize cypher query to avoid cartesian product


The query purpose is pretty trivial. For a given nodeId(userId) I want to return on the graph all nodes which has relaionship within X hops and I want to aggregate and return the distance(param which set on the relationship) between them)

I came up with this:

MATCH p=shortestPath((user:FOLLOWERS{userId:{1}})-[r:follow]-(f:FOLLOWERS)) " +
                        "WHERE f <> user " +
                        "RETURN (f.userId) as userId," +                     
                        "reduce(s = '', rel IN r | s + rel.dist + ',') as dist," +
                        "length(r) as hop"

userId({1}) is given as Input and is indexed.

I believe Iam having here cartesian product. how would you suggest avoiding it?


Solution

  • You can make the cartesian product less onerous by creating an index on :FOLLOWERS(userId) to speed up one of the two "legs" of the cartesian product:

    CREATE INDEX ON :FOLLOWERS(userId);
    

    Even though this will not get rid of the cartesian product, it will run in O(N log N) time, which is much faster than O(N ^ 2).

    By the way, your r relationship needs to be variable-length in order for your query to work. You should specify a reasonable upper bound (which depends on your DB) to assure that the query will finish in a reasonable time and not run out of memory. For example:

    MATCH p=shortestPath((user:FOLLOWERS { userId: 1 })-[r:follow*..5]-(f:FOLLOWERS))
    WHERE f <> user
    RETURN (f.userId) AS userId,
      REDUCE (s = '', rel IN r | s + rel.dist + ',') AS dist,
      LENGTH(r) AS hop;