Search code examples
neo4jcyphercollaborative-filtering

How to speed up Collaborative Filtering in Neo4j Cypher?


This is based on sample Cypher from Neo4j documentation:

MATCH (user:User)-[:ORDERS]->(:Product)<-[:ORDERS]-(otherUser:User)-[:ORDERS]->(recommended:Product)
WHERE NOT (user)-[:ORDERS]->(recommended)
  AND user.id = 171
RETURN distinct recommended.id, count(distinct otherUser.id) as frequency
ORDER BY frequency DESC
LIMIT 200

Whereas the following is the improvements I made:

MATCH (user:User)-[:ORDERS]->(p:Product)
WHERE user.id = 171
WITH DISTINCT p, user
MATCH (p)<-[:ORDERS]-(otherUser:User)
WITH DISTINCT otherUser, user
MATCH (otherUser)-[:ORDERS]->(recommended:Product)
WHERE NOT (user)-[:ORDERS]->(recommended)
RETURN distinct recommended.id, count(distinct otherUser.id) as frequency
ORDER BY frequency DESC
LIMIT 200

Both returns the same result but the second one runs 6 times faster. (but still 3 took seconds on my Macbook)

  1. Why the second one run faster?
  2. How to speed it up even further?

Solution

  • Your query gets the p products (that you do not want to recommend), but eventually drops them. Instead of dropping them, those p nodes could be used to compare against the recommended nodes, avoiding the additional DB hits needed to process WHERE NOT (user)-[:ORDERS]->(recommended) (which has to rescan every order for user every time). That should speed up your query significantly.

    Try this:

    MATCH (user:User)-[:ORDERS]->(p:Product)<-[:ORDERS]-(otherUser:User)
    WHERE user.id = 171
    WITH COLLECT(DISTINCT otherUser) AS others, COLLECT(DISTINCT p) AS sharedProds
    UNWIND others AS other
    MATCH (other)-[:ORDERS]->(recommended:Product)
    WHERE NOT recommended IN sharedProds
    RETURN DISTINCT recommended.id, count(DISTINCT other) as frequency
    ORDER BY frequency DESC
    LIMIT 200;
    

    Also, I assume that User nodes have unique id values, so I use count(DISTINCT otherUser) instead of count(DISTINCT otherUser.id), which should be faster.