i am trying to break the name into two parts and keeping first name last name and finally replacing the common part in all of them such that first name is must then last name then if middle name remain it is added to column
df['owner1_first_name'] = df['owner1_name'].str.split().str[0].astype(str,
errors='ignore')
df['owner1_last_name'] =
df['owner1_name'].str.split().str[-1].str.replace(df['owner1_first_name'],
"").astype(str, errors='ignore')
['owner1_middle_name'] =
df['owner1_name'].str.replace(df['owner1_first_name'],
"").str.replace(df['owner1_last_name'], "").astype(str, errors='ignore')
the problem is i am not able to use
.str.replace(df['owner1_name'], "")
as i am getting an error
"TypeError: 'Series' objects are mutable, thus they cannot be hashed"
is there any replacement sytax in pandas for what i am tryin to achieve
my desired output is
full name = THOMAS MARY D which is in column owner1_name
I want
owner1_first_name = THOMAS
owner1_middle_name = MARY
owner1_last_name = D
I think you need mask
which replace if same values in both columns to empty strings:
df = pd.DataFrame({'owner1_name':['THOMAS MARY D', 'JOE Long', 'MARY Small']})
splitted = df['owner1_name'].str.split()
df['owner1_first_name'] = splitted.str[0]
df['owner1_last_name'] = splitted.str[-1]
df['owner1_middle_name'] = splitted.str[1]
df['owner1_middle_name'] = df['owner1_middle_name']
.mask(df['owner1_middle_name'] == df['owner1_last_name'], '')
print (df)
owner1_name owner1_first_name owner1_last_name owner1_middle_name
0 THOMAS MARY D THOMAS D MARY
1 JOE Long JOE Long
2 MARY Small MARY Small
What is same as:
splitted = df['owner1_name'].str.split()
df['owner1_first_name'] = splitted.str[0]
df['owner1_last_name'] = splitted.str[-1]
middle = splitted.str[1]
df['owner1_middle_name'] = middle.mask(middle == df['owner1_last_name'], '')
print (df)
owner1_name owner1_first_name owner1_last_name owner1_middle_name
0 THOMAS MARY D THOMAS D MARY
1 JOE Long JOE Long
2 MARY Small MARY Small
EDIT:
For replace
by rows is possible use apply
with axis=1
:
df = pd.DataFrame({'owner1_name':['THOMAS MARY-THOMAS', 'JOE LongJOE', 'MARY Small']})
splitted = df['owner1_name'].str.split()
df['a'] = splitted.str[0]
df['b'] = splitted.str[-1]
df['c'] = df.apply(lambda x: x['b'].replace(x['a'], ''), axis=1)
print (df)
owner1_name a b c
0 THOMAS MARY-THOMAS THOMAS MARY-THOMAS MARY-
1 JOE LongJOE JOE LongJOE Long
2 MARY Small MARY Small Small
the exact code to in three line to achieve what i wanted in my question is
df['owner1_first_name'] = df['owner1_name'].str.split().str[0]
df['owner1_last_name'] = df.apply(lambda x: x['owner1_name'].split()
[-1].replace(x['owner1_first_name'], ''), axis=1)
df['owner1_middle_name'] = df.apply(lambda x:
x['owner1_name'].replace(x['owner1_first_name'],
'').replace(x['owner1_last_name'], ''), axis=1)