Search code examples
python-3.xpandasnumpynanstring-concatenation

How to combine different columns in a dataframe using comprehension-python


Suppose a dataframe contains

attacker_1 attacker_2  attacker_3  attacker_4
Lannister   nan         nan         nan
nan         Stark       greyjoy     nan

I want to create another column called AttackerCombo that aggregates the 4 columns into 1 column. How would I go about defining such code in python? I have been practicing python and I reckon a list comprehension of this sort makes sense, but [list(x) for x in attackers] where attackers is a numpy array of the 4 columns displays all 4 columns aggregated into 1 column, however I would like to remove all the nans as well. So the result for each row instead of looking like

starknannanlannister
would look like
stark/lannister


Solution

  • I think you need apply with join and remove NaN by dropna:

    df['attackers'] = df[['attacker_1','attacker_2','attacker_3','attacker_4']] \
                        .apply(lambda x: '/'.join(x.dropna()), axis=1)
    print (df)
      attacker_1 attacker_2 attacker_3  attacker_4      attackers
    0  Lannister        NaN        NaN         NaN      Lannister
    1        NaN      Stark    greyjoy         NaN  Stark/greyjoy
    

    If need separator empty string use DataFrame.fillna:

    df['attackers'] = df[['attacker_1','attacker_2','attacker_3','attacker_4']].fillna('') \
                        .apply(''.join, axis=1)
    print (df)
      attacker_1 attacker_2 attacker_3  attacker_4     attackers
    0  Lannister        NaN        NaN         NaN     Lannister
    1        NaN      Stark    greyjoy         NaN  Starkgreyjoy
    

    Another 2 solutions with list comprehension - first compare by notnull and second check if string:

    df['attackers'] = df[['attacker_1','attacker_2','attacker_3','attacker_4']] \
                        .apply(lambda x: '/'.join([e for e in x if pd.notnull(e)]), axis=1)
    print (df)
      attacker_1 attacker_2 attacker_3  attacker_4      attackers
    0  Lannister        NaN        NaN         NaN      Lannister
    1        NaN      Stark    greyjoy         NaN  Stark/greyjoy
    
    
    #python 3 - isinstance(e, str), python 2 - isinstance(e, basestring)
    df['attackers'] = df[['attacker_1','attacker_2','attacker_3','attacker_4']] \
                        .apply(lambda x: '/'.join([e for e in x if isinstance(e, str)]), axis=1)
    print (df)
      attacker_1 attacker_2 attacker_3  attacker_4      attackers
    0  Lannister        NaN        NaN         NaN      Lannister
    1        NaN      Stark    greyjoy         NaN  Stark/greyjoy