How to speed up Collaborative Filtering in Neo4j Cypher?

This is based on sample Cypher from Neo4j documentation:

MATCH (user:User)-[:ORDERS]->(:Product)<-[:ORDERS]-(otherUser:User)-[:ORDERS]->(recommended:Product)
WHERE NOT (user)-[:ORDERS]->(recommended)
  AND user.id = 171
RETURN distinct recommended.id, count(distinct otherUser.id) as frequency
ORDER BY frequency DESC
LIMIT 200

Whereas the following is the improvements I made:

MATCH (user:User)-[:ORDERS]->(p:Product)
WHERE user.id = 171
WITH DISTINCT p, user
MATCH (p)<-[:ORDERS]-(otherUser:User)
WITH DISTINCT otherUser, user
MATCH (otherUser)-[:ORDERS]->(recommended:Product)
WHERE NOT (user)-[:ORDERS]->(recommended)
RETURN distinct recommended.id, count(distinct otherUser.id) as frequency
ORDER BY frequency DESC
LIMIT 200

Both returns the same result but the second one runs 6 times faster. (but still 3 took seconds on my Macbook)

Why the second one run faster?
How to speed it up even further?

Solution

Your query gets the p products (that you do not want to recommend), but eventually drops them. Instead of dropping them, those p nodes could be used to compare against the recommended nodes, avoiding the additional DB hits needed to process WHERE NOT (user)-[:ORDERS]->(recommended) (which has to rescan every order for user every time). That should speed up your query significantly.

Try this:

MATCH (user:User)-[:ORDERS]->(p:Product)<-[:ORDERS]-(otherUser:User)
WHERE user.id = 171
WITH COLLECT(DISTINCT otherUser) AS others, COLLECT(DISTINCT p) AS sharedProds
UNWIND others AS other
MATCH (other)-[:ORDERS]->(recommended:Product)
WHERE NOT recommended IN sharedProds
RETURN DISTINCT recommended.id, count(DISTINCT other) as frequency
ORDER BY frequency DESC
LIMIT 200;

Also, I assume that User nodes have unique id values, so I use count(DISTINCT otherUser) instead of count(DISTINCT otherUser.id), which should be faster.