I am working on an OSS project called deps-graph, basically I download data from https://static.crates.io/db-dump.tar.gz and then pre-process them and connect crate versions together based on how they depend on each other (meaning that I work with a lot of data).
My creation command for crate version looks like this (very simplified):
create (:CargoCrateVersion {id: map[0], num: map[1], features: map[2]})
Connecting relations command looks like this:
MATCH (cv_from:CargoCrateVersion {id: map[0]}), (cv_to:CargoCrateVersion {id: map[1]}) CREATE (cv_from)-[:DEPENDS_ON {optional: map[2], default_features: map[3], with_features: map[4], target: map[5], kind: map[6]}]->(cv_to)
(since I'm bulk inserting, I'm using unwind to supply the data inside "map")
I am now trying to query this data, however I have performance problems. I am running the following query which traverses nodes that depend on each other.
GRAPH.QUERY cargo_graph "MATCH (cv: CargoCrateVersion {id: 468088})-[d:DEPENDS_ON*1..2]->(cv2) RETURN cv, COLLECT(cv2)"
As you may notice, I'm limiting the traversal to 2 depths, because the time it takes for another depth level is nearly exponential! For example, on my machine, the query limiting to "2" runs in 360ms, query limiting to "3" takes 700ms, query limiting to "5" takes 1500ms and so on. When I tried to not limit the query, the redisgraph server crashed after a minute or so, because I didn't have enough RAM.
Also, I think it's good to point out that this is literally one of my first projects working with redisgraph / cypher. I have tried to research this, however I was unable to come up with ways to optimize this query.
How can I optimize the query to get all dependencies without crashing the database / waiting forever?
GRAPH.QUERY cargo_graph "CREATE INDEX FOR (n:CargoCrateVersion) ON (n.id)"
GRAPH.CONFIG SET QUERY_MEM_CAPACITY 1048576
Lastly if possible please share the output of:
GRAPH.PROFILE cargo_graph "MATCH (cv: CargoCrateVersion {id: 468088})-[d:DEPENDS_ON*1..2]->(cv2) RETURN cv, COLLECT(cv2)"