This is based on sample Cypher from Neo4j documentation:
MATCH (user:User)-[:ORDERS]->(:Product)<-[:ORDERS]-(otherUser:User)-[:ORDERS]->(recommended:Product)
WHERE NOT (user)-[:ORDERS]->(recommended)
AND user.id = 171
RETURN distinct recommended.id, count(distinct otherUser.id) as frequency
ORDER BY frequency DESC
LIMIT 200
Whereas the following is the improvements I made:
MATCH (user:User)-[:ORDERS]->(p:Product)
WHERE user.id = 171
WITH DISTINCT p, user
MATCH (p)<-[:ORDERS]-(otherUser:User)
WITH DISTINCT otherUser, user
MATCH (otherUser)-[:ORDERS]->(recommended:Product)
WHERE NOT (user)-[:ORDERS]->(recommended)
RETURN distinct recommended.id, count(distinct otherUser.id) as frequency
ORDER BY frequency DESC
LIMIT 200
Both returns the same result but the second one runs 6 times faster. (but still 3 took seconds on my Macbook)
Your query gets the p
products (that you do not want to recommend), but eventually drops them. Instead of dropping them, those p
nodes could be used to compare against the recommended
nodes, avoiding the additional DB hits needed to process WHERE NOT (user)-[:ORDERS]->(recommended)
(which has to rescan every order for user
every time). That should speed up your query significantly.
Try this:
MATCH (user:User)-[:ORDERS]->(p:Product)<-[:ORDERS]-(otherUser:User)
WHERE user.id = 171
WITH COLLECT(DISTINCT otherUser) AS others, COLLECT(DISTINCT p) AS sharedProds
UNWIND others AS other
MATCH (other)-[:ORDERS]->(recommended:Product)
WHERE NOT recommended IN sharedProds
RETURN DISTINCT recommended.id, count(DISTINCT other) as frequency
ORDER BY frequency DESC
LIMIT 200;
Also, I assume that User
nodes have unique id
values, so I use count(DISTINCT otherUser)
instead of count(DISTINCT otherUser.id)
, which should be faster.