Using neo4j cypher, what query would efficiently find actors who are not connected to Kevin Bacon? We can say that 'not connected' means that an actor is not connected to Kevin Bacon by at least 10 hops for simplicity.
Here is what I have attempted:
MATCH (kb:Actor {name:'Kevin Bacon'})-[*1..10]-(h:Actor) with h
MATCH (a)-[:ACTS_IN]->(m)
WHERE a <> h
RETURN DISTINCT h.name
However, this query runs for 3 days. How can I do this more efficiently?
(A) Your first MATCH
finds every actor that is connected within 10 hops to Kevin Bacon. The result of this clause is a number (M) of rows (and if an actor is connected in, say, 7 different ways to Kevin, then that actor is represented in 7 rows).
(B) Your second MATCH
finds every actor that has acted in a movie. If this MATCH
clause were standalone, then it would require N rows, where N is the number of ACTS_IN
relationships (and if an actor acted in, say, 9 movies, then that actor would be represented in 9 rows). However, since the clause comes right after another MATCH
clause, you get a cartesian product and the actual number of result rows is M*N.
So, your query requires a lot of storage and performs a (potentially large) number of redundant comparisons, and your results can contain duplicate names. To reduce the storage requirements and the number of actor comparisons (in your WHERE
clause): you should cause the results of A and B to have distinct actors, and eliminate the cartesian product.
The following query should do that. It first collects a single list (in a single row) of every distinct actor that is connected within 10 hops to Kevin Bacon (as hs
), and then finds all (distinct) actors not in that collection:
MATCH (kb:Actor {name:'Kevin Bacon'})-[*..10]-(h:Actor)
WITH COLLECT(DISTINCT h) AS hs
MATCH (a:Actor)
WHERE NOT a IN hs
RETURN a.name;
(This query also saves even more time by not bothering to test whether an actor has acted in a movie.)
The performance would still depend on how long it takes to perform the variable length path search in the first MATCH
, however.