Suppose a dataframe contains
attacker_1 attacker_2 attacker_3 attacker_4 Lannister nan nan nan nan Stark greyjoy nan
I want to create another column called AttackerCombo that aggregates the 4 columns into 1 column. How would I go about defining such code in python? I have been practicing python and I reckon a list comprehension of this sort makes sense, but [list(x) for x in attackers] where attackers is a numpy array of the 4 columns displays all 4 columns aggregated into 1 column, however I would like to remove all the nans as well. So the result for each row instead of looking like
starknannanlannisterwould look like
stark/lannister
I think you need apply
with join
and remove NaN
by dropna
:
df['attackers'] = df[['attacker_1','attacker_2','attacker_3','attacker_4']] \
.apply(lambda x: '/'.join(x.dropna()), axis=1)
print (df)
attacker_1 attacker_2 attacker_3 attacker_4 attackers
0 Lannister NaN NaN NaN Lannister
1 NaN Stark greyjoy NaN Stark/greyjoy
If need separator
empty string use DataFrame.fillna
:
df['attackers'] = df[['attacker_1','attacker_2','attacker_3','attacker_4']].fillna('') \
.apply(''.join, axis=1)
print (df)
attacker_1 attacker_2 attacker_3 attacker_4 attackers
0 Lannister NaN NaN NaN Lannister
1 NaN Stark greyjoy NaN Starkgreyjoy
Another 2 solutions with list comprehension
- first compare by notnull
and second check if string
:
df['attackers'] = df[['attacker_1','attacker_2','attacker_3','attacker_4']] \
.apply(lambda x: '/'.join([e for e in x if pd.notnull(e)]), axis=1)
print (df)
attacker_1 attacker_2 attacker_3 attacker_4 attackers
0 Lannister NaN NaN NaN Lannister
1 NaN Stark greyjoy NaN Stark/greyjoy
#python 3 - isinstance(e, str), python 2 - isinstance(e, basestring)
df['attackers'] = df[['attacker_1','attacker_2','attacker_3','attacker_4']] \
.apply(lambda x: '/'.join([e for e in x if isinstance(e, str)]), axis=1)
print (df)
attacker_1 attacker_2 attacker_3 attacker_4 attackers
0 Lannister NaN NaN NaN Lannister
1 NaN Stark greyjoy NaN Stark/greyjoy