Search code examples
pythoncapitalization

Simple way to Capital Case Proper Names with Surname Prefixes


I am hoping to import names from a data source into which people are inconsistent with the Case of the names:

  • JOHN BROWN
  • Kathy V Simmons
  • juan velasquez

My first approach was using title()

name_object = {
      "first_name": row['First Name'].title(),
      "last_name": row['Last Name'].title(),
      "mi": row['MI'].title()
    }

But of course (and my Irish ancestors are rolling in their graves) this breaks names like McKinley, DeSantis, etc...

In this post someone has rolled their own version of title() using capitalize(), but it seems like disambiguating between names that begin with a "prefix" like "de" or "Di" and one's which just start with those letters (Diaz) is going to make less sense than really encouraging people to use a consistent approach in the initial data entry.

Is there a relatively simple automated approach I'm not thinking of?


Solution

  • There is not a straightforward way to handle those prefixes. The information you need to disambiguate different usages is not contained with the text itself.

    The root problem is that you need too much cultural data to distinguish, say, "D'Arc" from "Darc". Both names are French; you would require more than merely the text of the name. There are similar problems between languages, across centuries, dealing with variant spellings and transcription errors, etc.