Search code examples
pandasdataframecase

Change case and reorder string fields in pandas dataframe


I have created the following dataframe:

import pandas as pd
ds1 = {'Name':["CARO QUINTERO, Miguel Angel","CHANG, Ping Yun","GILBOA, Joseph"]}

df1 = pd.DataFrame(data=ds1)

Which looks like this:

                          Name
0  CARO QUINTERO, Miguel Angel
1              CHANG, Ping Yun
2               GILBOA, Joseph

So, the text before the comma is the "last name", whilst the text after the comma is the first name (s).

I need to create a new column (called matchKey) such that:

  • it contains first and last name (in that order)
  • the first letter of the first name and last name is in upper case.

So, from the example above, the resulting dataframe would look like this:

[![enter image description here][1]][1]

Does anybody know how to do to it in pandas? [1]: https://i.sstatic.net/By1Uz.png


Solution

  • Use Series.str.replace with Series.str.title:

    df1['matchKey']=df1['Name'].str.replace(r'^(.*),\s*(.*)$',r'\2 \1', regex=True).str.title()
    print (df1)
                              Name                    matchKey
    0  CARO QUINTERO, Miguel Angel  Miguel Angel Caro Quintero
    1              CHANG, Ping Yun              Ping Yun Chang
    2               GILBOA, Joseph               Joseph Gilboa
    

    Or Series.str.extract with Series.str.cat:

    s = df1['Name'].str.extract(r'^(.*),(.*)$')
    df1['matchKey'] = s[1].str.cat( s[0].str.title(), sep=' ')
    print (df1)
                              Name                     matchKey
    0  CARO QUINTERO, Miguel Angel   Miguel Angel Caro Quintero
    1              CHANG, Ping Yun               Ping Yun Chang
    2               GILBOA, Joseph                Joseph Gilboa