Search code examples
neo4jcypherneo4j-apoc

Create relationship between nodes based on Jaccard similarity of the nodes attributes in Neo4j?


I have multiple nodes in my Neo4j graph. I want to create relationship between any 2 nodes, if and only if, their Jaccard similarity on their attributes is above some threshold alpha.

Consider 2 nodes:

Node 1: {id:1, abc: 1.1, eww: -9.4, ssv: "likj"}
Node 2: {id:2, we2: 1, eww: 900}
Node 3: {id:3, kuku: -91, lulu: 383, ssv: "bubu"}

So Node1 and Node2 Jaccard similarity on their attributes would be: (intersection =) 2/ (union =) 5 = 0.4

How can I do this in Neo4j? I know there is a Jaccard similarity function, but how to config it to work on the attributes of the nodes?


Solution

  • Assuming you mean the Jaccard similarity of the presence of properties then you could do something like this

    MATCH (a:Node)
    MATCH (b:Node) WHERE id(b) > id(a)
    WITH a, b, [prop IN keys(a) WHERE prop IN keys(b)] AS shared_properties // Find the properties that exist on both nodes using the IN operator
    WITH a, b, size(shared_properties) AS shared_property_count // Get the number of shared properties 
    WITH 1.0*shared_property_count / size(apoc.coll.union(keys(a), keys(b))) AS jaccard_similarity, a, b // Compute the Jaccard similarity as the intersection over the union
    WHERE jaccard_similarity > $threshold // Make sure the similarity is higher than some threshold
    CREATE (a)-[:SIMILAR_TO {jaccard: jaccard_similarity}]->(b) 
    

    The WITH statements find the properties that are present on both nodes and counts them and in the end we find the Jaccard similarity.