Search code examples
rubyparsingnames

Parsing Human Names and matching them in Ruby


I'm looking for a gem or project that would let me identify that two names are the same person. For example

J.R. Smith == John R. Smith == John Smith == John Roy Smith == Johnny Smith

I think you get the idea. I know nothing is going to be 100% accurate but I'd like to get something that at least handles the majority of cases. I know that last one is probably going to need a database of nicknames.


Solution

  • I think one option would be to use a ruby implementation of the Levenshtein distance

    The Levenshtein distance between two strings is defined as the minimum number of edits needed to transform one string into the other, with the allowable edit operations being insertion, deletion, or substitution of a single character.

    Then you could define that names with a distance less than X (being X a number you will have to tweak) are from the same person.

    EDIT Through a little search I was able to find another algorithm, based on phonetics called Metaphone

    Still has a lot of holes in it, but I think that in this case the best everyone can do is to give you alternatives for you to test and see what works best