I have created the following dataframe:
import pandas as pd
ds1 = {'Name':["CARO QUINTERO, Miguel Angel","CHANG, Ping Yun","GILBOA, Joseph"]}
df1 = pd.DataFrame(data=ds1)
Which looks like this:
Name
0 CARO QUINTERO, Miguel Angel
1 CHANG, Ping Yun
2 GILBOA, Joseph
So, the text before the comma is the "last name", whilst the text after the comma is the first name (s).
I need to create a new column (called matchKey
) such that:
So, from the example above, the resulting dataframe would look like this:
[![enter image description here][1]][1]
Does anybody know how to do to it in pandas? [1]: https://i.sstatic.net/By1Uz.png
Use Series.str.replace
with Series.str.title
:
df1['matchKey']=df1['Name'].str.replace(r'^(.*),\s*(.*)$',r'\2 \1', regex=True).str.title()
print (df1)
Name matchKey
0 CARO QUINTERO, Miguel Angel Miguel Angel Caro Quintero
1 CHANG, Ping Yun Ping Yun Chang
2 GILBOA, Joseph Joseph Gilboa
Or Series.str.extract
with Series.str.cat
:
s = df1['Name'].str.extract(r'^(.*),(.*)$')
df1['matchKey'] = s[1].str.cat( s[0].str.title(), sep=' ')
print (df1)
Name matchKey
0 CARO QUINTERO, Miguel Angel Miguel Angel Caro Quintero
1 CHANG, Ping Yun Ping Yun Chang
2 GILBOA, Joseph Joseph Gilboa