Search code examples
pythonpandasconcatenationdataframe

Concatenate column values in Pandas DataFrame with "NaN" values


I'm trying to concatenate Pandas DataFrame columns with NaN values.

In [96]:df = pd.DataFrame({'col1' : ["1","1","2","2","3","3"],
                'col2'  : ["p1","p2","p1",np.nan,"p2",np.nan], 'col3' : ["A","B","C","D","E","F"]})

In [97]: df
Out[97]: 
  col1 col2 col3
0    1   p1    A
1    1   p2    B
2    2   p1    C
3    2  NaN    D
4    3   p2    E
5    3  NaN    F

In [98]: df['concatenated'] = df['col2'] +','+ df['col3']
In [99]: df
Out[99]: 
  col1 col2 col3 concatenated
0    1   p1    A         p1,A
1    1   p2    B         p2,B
2    2   p1    C         p1,C
3    2  NaN    D          NaN
4    3   p2    E         p2,E
5    3  NaN    F          NaN

Instead of 'NaN' values in "concatenated" column, I want to get "D" and "F" respectively for this example?


Solution

  • I don't think your problem is trivial. However, here is a workaround using numpy vectorization :

    In [49]: def concat(*args):
        ...:     strs = [str(arg) for arg in args if not pd.isnull(arg)]
        ...:     return ','.join(strs) if strs else np.nan
        ...: np_concat = np.vectorize(concat)
        ...: 
    
    In [50]: np_concat(df['col2'], df['col3'])
    Out[50]: 
    array(['p1,A', 'p2,B', 'p1,C', 'D', 'p2,E', 'F'], 
          dtype='|S64')
    
    In [51]: df['concatenated'] = np_concat(df['col2'], df['col3'])
    
    In [52]: df
    Out[52]: 
      col1 col2 col3 concatenated
    0    1   p1    A         p1,A
    1    1   p2    B         p2,B
    2    2   p1    C         p1,C
    3    2  NaN    D            D
    4    3   p2    E         p2,E
    5    3  NaN    F            F
    
    [6 rows x 4 columns]