Search code examples
pythonstringpandasconcatenation

Concatenate cells into a string with separator pandas python


Given the following:

df = pd.DataFrame({'col1' : ["a","b"],
            'col2'  : ["ab",np.nan], 'col3' : ["w","e"]})

I would like to be able to create a column that joins the content of all three columns into one string, separated by the character "*" while ignoring NaN.

so that I would get something like that for example:

a*ab*w
b*e

Any ideas?

Just realised there were a few additional requirements, I needed the method to work with ints and floats and also to be able to deal with special characters (e.g., letters of Spanish alphabet).


Solution

  • In [68]:
    
    df['new_col'] = df.apply(lambda x: '*'.join(x.dropna().values.tolist()), axis=1)
    df
    Out[68]:
      col1 col2 col3 new_col
    0    a   ab    w  a*ab*w
    1    b  NaN    e     b*e
    

    UPDATE

    If you have ints or float you can convert these to str first:

    In [74]:
    
    df = pd.DataFrame({'col1' : ["a","b",3],
                'col2'  : ["ab",np.nan, 4], 'col3' : ["w","e", 6]})
    df
    Out[74]:
      col1 col2 col3
    0    a   ab    w
    1    b  NaN    e
    2    3    4    6
    In [76]:
    
    df['new_col'] = df.apply(lambda x: '*'.join(x.dropna().astype(str).values), axis=1)
    df
    Out[76]:
      col1 col2 col3 new_col
    0    a   ab    w  a*ab*w
    1    b  NaN    e     b*e
    2    3    4    6   3*4*6
    

    Another update

    In [81]:
    
    df = pd.DataFrame({'col1' : ["a","b",3,'ñ'],
                'col2'  : ["ab",np.nan, 4,'ü'], 'col3' : ["w","e", 6,'á']})
    df
    Out[81]:
      col1 col2 col3
    0    a   ab    w
    1    b  NaN    e
    2    3    4    6
    3    ñ    ü    á
    
    In [82]:
    
    df['new_col'] = df.apply(lambda x: '*'.join(x.dropna().astype(str).values), axis=1)
    ​
    df
    Out[82]:
      col1 col2 col3 new_col
    0    a   ab    w  a*ab*w
    1    b  NaN    e     b*e
    2    3    4    6   3*4*6
    3    ñ    ü    á   ñ*ü*á
    

    My code still works with Spanish characters