I want to compare names which are in different formats, eg: "George W. Bush", "George Bush", "George Walker Bush", "Bush, George Walker", "Bush, GW", "Bush, George" etc. There are few with dots (".") as well, but I omitted those from the list because I will normalize those anyways. In fact, the commas (",") will be stripped as well.
What is the best and optimized approach to determine if any 2 given names actually represent the same person? I have thought of using nameparser
and build a comparison algorithm, but please provide any other possible options. Any approach using standard modules of Python will be fine too.
There's an open source library which can be useful, or at least can be used as base to build more functionalities.
Sample usage:
>>> from whoswho import who
>>> who.match('Bush, G.W.', 'George W. Bush')