Search code examples
pythonpandasmerging-data

Collapse overlapping coloums in pandas dataframe


I have a data frame that looks like this:

>>> df = pd.DataFrame({'P1':['ARF5','NaN','NaN'],'P2':['NaN','M6PR','NaN'],'P3':['NaN','NaN','NDUFAF7']})
>>> df
     P1    P2       P3
0  ARF5   NaN      NaN
1   NaN  M6PR      NaN
2   NaN   NaN  NDUFAF7

I have been trying to collapse it down to something like this:

     C1
0  ARF5  
1  M6PR
2  NDUFAF7

All columns have an overlap but the degree I do not know. Also I do not know how many columns will be in this df at any iteration since it is part of pipeline of which I need to aggregate my output from.

I think in principle I need the functionality of combine_first but for columns. I tried something like this:

df['condensed'] = reduce(lambda x,y:x.combine_first(y),[df[:]])

or

df['condensed'] = reduce(lambda x,y:x.combine_first(y),[df['P1'],df['P2'],df['P3']])

But I have some issues figuring this out. Thanks for the help!


Solution

  • Use bfill on axis=1:

    df['C1'] = df.replace('NaN', np.nan).bfill(axis=1)['P1']
    
    >>> df
    
         P1    P2       P3       C1
    0  ARF5   NaN      NaN     ARF5
    1   NaN  M6PR      NaN     M6PR
    2   NaN   NaN  NDUFAF7  NDUFAF7