Search code examples
neo4jcyphersimilarity

Neo4j similarity of single node with entire graph


I'm trying to use gds in neo4j do calculate similarities. I understand how to get gds to calculate all the similarities in the in memory graph, but that just tells me, over the whole graph, the similarity of each pair of nodes.

Given this node N, I want the similarity of N with every other node. Obviously the performance of the latter would be much faster.

I tried to express this with a query of this type:

CALL gds.nodeSimilarity.stream('test', { relationshipWeightProperty: 'strength', similarityCutoff: 0.1 })
YIELD node1, node2, similarity
WITH gds.util.asNode(node1) AS n1, gds.util.asNode(node2)AS n2, similarity
WHERE n1.name = "Chair1"
RETURN n1.name, n2.name, similarity
ORDER BY n1.name

But what is really happening under the hood? Is gds:

A) calculating ALL the similarities between every node1 and node2 and then filtering the results only for Chair1?

OR

B) Is gds ONLY calculating the results between Chair1 and every other node? I'd need behaviour B to happen for me, but after some testing with the airport databases it seems that the execution time is shorter without the WHERE clause than with, so my nose tells me that it may be behaviour A.

Is there a way to force behaviour B?


Solution

  • As commented by a Neo4j developer, as of now for the above code snippet, GDS is calculating all the similarities and post-filtering the results (the WHERE is applied to the result stream from the node similarity algorithm).

    More sophisticated filters are going to be released with version 2.1, but in the meanwhile this answer may clarify the behaviour for some people.