Search code examples
neo4jcypherlevenshtein-distance

Find similarity between nodes


I have lots of profiles as nodes and would like to match nodes whose name property have a certain string similarity.

How is that possible with Neo4j?

Example data:

NodeA: {
    "name": "Jacob F Saxberg"
},
NodeB: {
    "name": "Jacob Friis Saxberg"
}

I'd like to get the Levenshtein distance (4) or something similar with Neo4j.


Solution

  • Since Levenshtein distance is a function of two nodes f(nodeA, nodeB) and since it's symmetric ( f(nodeA,nodeB)==f(nodeB,NodeA) ) it might be a good choice to store the result of Levenshtein distance as a property on a relationship between nodeA and node B.

    You can use cypher to find all the nodes for which the Levensthein distance should be calculated. Using Java (or your preferred client language) you can iterate of the nodes found, do the math and write the result back into the graph.