I have multiple nodes in my Neo4j graph. I want to create relationship between any 2 nodes, if and only if, their Jaccard similarity on their attributes is above some threshold alpha.
Consider 2 nodes:
Node 1: {id:1, abc: 1.1, eww: -9.4, ssv: "likj"}
Node 2: {id:2, we2: 1, eww: 900}
Node 3: {id:3, kuku: -91, lulu: 383, ssv: "bubu"}
So Node1 and Node2 Jaccard similarity on their attributes would be: (intersection =) 2/ (union =) 5 = 0.4
How can I do this in Neo4j? I know there is a Jaccard similarity function, but how to config it to work on the attributes of the nodes?
Assuming you mean the Jaccard similarity of the presence of properties then you could do something like this
MATCH (a:Node)
MATCH (b:Node) WHERE id(b) > id(a)
WITH a, b, [prop IN keys(a) WHERE prop IN keys(b)] AS shared_properties // Find the properties that exist on both nodes using the IN operator
WITH a, b, size(shared_properties) AS shared_property_count // Get the number of shared properties
WITH 1.0*shared_property_count / size(apoc.coll.union(keys(a), keys(b))) AS jaccard_similarity, a, b // Compute the Jaccard similarity as the intersection over the union
WHERE jaccard_similarity > $threshold // Make sure the similarity is higher than some threshold
CREATE (a)-[:SIMILAR_TO {jaccard: jaccard_similarity}]->(b)
The WITH
statements find the properties that are present on both nodes and counts them and in the end we find the Jaccard similarity.