I have a data frame that looks like this:
>>> df = pd.DataFrame({'P1':['ARF5','NaN','NaN'],'P2':['NaN','M6PR','NaN'],'P3':['NaN','NaN','NDUFAF7']})
>>> df
P1 P2 P3
0 ARF5 NaN NaN
1 NaN M6PR NaN
2 NaN NaN NDUFAF7
I have been trying to collapse it down to something like this:
C1
0 ARF5
1 M6PR
2 NDUFAF7
All columns have an overlap but the degree I do not know. Also I do not know how many columns will be in this df at any iteration since it is part of pipeline of which I need to aggregate my output from.
I think in principle I need the functionality of combine_first
but for columns.
I tried something like this:
df['condensed'] = reduce(lambda x,y:x.combine_first(y),[df[:]])
or
df['condensed'] = reduce(lambda x,y:x.combine_first(y),[df['P1'],df['P2'],df['P3']])
But I have some issues figuring this out. Thanks for the help!
Use bfill
on axis=1
:
df['C1'] = df.replace('NaN', np.nan).bfill(axis=1)['P1']
>>> df
P1 P2 P3 C1
0 ARF5 NaN NaN ARF5
1 NaN M6PR NaN M6PR
2 NaN NaN NDUFAF7 NDUFAF7