Given the following:
df = pd.DataFrame({'col1' : ["a","b"],
'col2' : ["ab",np.nan], 'col3' : ["w","e"]})
I would like to be able to create a column that joins the content of all three columns into one string, separated by the character "*" while ignoring NaN
.
so that I would get something like that for example:
a*ab*w
b*e
Any ideas?
Just realised there were a few additional requirements, I needed the method to work with ints and floats and also to be able to deal with special characters (e.g., letters of Spanish alphabet).
In [68]:
df['new_col'] = df.apply(lambda x: '*'.join(x.dropna().values.tolist()), axis=1)
df
Out[68]:
col1 col2 col3 new_col
0 a ab w a*ab*w
1 b NaN e b*e
UPDATE
If you have ints or float you can convert these to str
first:
In [74]:
df = pd.DataFrame({'col1' : ["a","b",3],
'col2' : ["ab",np.nan, 4], 'col3' : ["w","e", 6]})
df
Out[74]:
col1 col2 col3
0 a ab w
1 b NaN e
2 3 4 6
In [76]:
df['new_col'] = df.apply(lambda x: '*'.join(x.dropna().astype(str).values), axis=1)
df
Out[76]:
col1 col2 col3 new_col
0 a ab w a*ab*w
1 b NaN e b*e
2 3 4 6 3*4*6
Another update
In [81]:
df = pd.DataFrame({'col1' : ["a","b",3,'ñ'],
'col2' : ["ab",np.nan, 4,'ü'], 'col3' : ["w","e", 6,'á']})
df
Out[81]:
col1 col2 col3
0 a ab w
1 b NaN e
2 3 4 6
3 ñ ü á
In [82]:
df['new_col'] = df.apply(lambda x: '*'.join(x.dropna().astype(str).values), axis=1)
df
Out[82]:
col1 col2 col3 new_col
0 a ab w a*ab*w
1 b NaN e b*e
2 3 4 6 3*4*6
3 ñ ü á ñ*ü*á
My code still works with Spanish characters