Search code examples
pythonpandasdata-cleaning

i am trying to split a full name to first middle and last name in pandas but i am stuck at replace


i am trying to break the name into two parts and keeping first name last name and finally replacing the common part in all of them such that first name is must then last name then if middle name remain it is added to column

df['owner1_first_name'] = df['owner1_name'].str.split().str[0].astype(str, 
errors='ignore')
df['owner1_last_name'] = 
df['owner1_name'].str.split().str[-1].str.replace(df['owner1_first_name'], 
"").astype(str, errors='ignore')
['owner1_middle_name'] = 
df['owner1_name'].str.replace(df['owner1_first_name'], 
"").str.replace(df['owner1_last_name'], "").astype(str, errors='ignore')

the problem is i am not able to use .str.replace(df['owner1_name'], "")
as i am getting an error "TypeError: 'Series' objects are mutable, thus they cannot be hashed"

is there any replacement sytax in pandas for what i am tryin to achieve

my desired output is

full name = THOMAS MARY D which is in column owner1_name

I want

owner1_first_name = THOMAS
owner1_middle_name = MARY
owner1_last_name = D

Solution

  • I think you need mask which replace if same values in both columns to empty strings:

    df = pd.DataFrame({'owner1_name':['THOMAS MARY D', 'JOE Long', 'MARY Small']})
    
    splitted = df['owner1_name'].str.split()
    df['owner1_first_name'] = splitted.str[0]
    df['owner1_last_name'] = splitted.str[-1]
    df['owner1_middle_name'] = splitted.str[1]
    df['owner1_middle_name'] = df['owner1_middle_name']
                                 .mask(df['owner1_middle_name'] == df['owner1_last_name'], '')
    print (df)
         owner1_name owner1_first_name owner1_last_name owner1_middle_name
    0  THOMAS MARY D            THOMAS                D               MARY
    1       JOE Long               JOE             Long                   
    2     MARY Small              MARY            Small  
    

    What is same as:

    splitted = df['owner1_name'].str.split()
    df['owner1_first_name'] = splitted.str[0]
    df['owner1_last_name'] = splitted.str[-1]
    middle = splitted.str[1] 
    df['owner1_middle_name'] = middle.mask(middle == df['owner1_last_name'], '')
    print (df)
         owner1_name owner1_first_name owner1_last_name owner1_middle_name
    0  THOMAS MARY D            THOMAS                D               MARY
    1       JOE Long               JOE             Long                   
    2     MARY Small              MARY            Small                   
    

    EDIT:

    For replace by rows is possible use apply with axis=1:

    df = pd.DataFrame({'owner1_name':['THOMAS MARY-THOMAS', 'JOE LongJOE', 'MARY Small']})
    
    splitted = df['owner1_name'].str.split()
    df['a'] = splitted.str[0]
    df['b'] = splitted.str[-1]
    
    df['c'] = df.apply(lambda x: x['b'].replace(x['a'], ''), axis=1)
    print (df)
              owner1_name       a            b      c
    0  THOMAS MARY-THOMAS  THOMAS  MARY-THOMAS  MARY-
    1         JOE LongJOE     JOE      LongJOE   Long
    2          MARY Small    MARY        Small  Small
    

    the exact code to in three line to achieve what i wanted in my question is

    df['owner1_first_name'] = df['owner1_name'].str.split().str[0]
    df['owner1_last_name'] = df.apply(lambda x: x['owner1_name'].split()
    [-1].replace(x['owner1_first_name'], ''), axis=1)
    df['owner1_middle_name'] = df.apply(lambda x: 
    x['owner1_name'].replace(x['owner1_first_name'], 
    '').replace(x['owner1_last_name'], ''), axis=1)