Example:
DF = pd.DataFrame({'A': [0, 0, np.NaN, 0 , np.NaN, 0 , 0 , 0 ],
'B': [1, 1, np.NaN, 1 , np.NaN, 1 , 1 , 1 ],
'C': [8, 8, np.NaN, 8 , np.NaN, np.NaN, 8 , 8 ],
'D': [2, 2, 2 , np.NaN, np.NaN, 2 , np.NaN, np.NaN],
'E': [3, 3, 3 , np.NaN, np.NaN, 3 , np.NaN, np.NaN]})
Expected result that I want is to get the columns A and B as filled as possible, that is:
1) If DF['A'] line is NaN, it should get the correspondent DF['D'] line
2) If DF['B'] line is NaN, it should get the correspondent DF['E'] line
3) DF['C'] shall remain as it is
I'm trying:
DF[['A', 'B']] = DF[['A','B']].fillna(DF[['D','E']])
But it seems it would work only if there was two different dataframes with same columns name. I could split DF in DF1 and DF2, rename DF2['D'] to A and DF2['E'] to B and do:
DF1[['A', 'B']] = DF1[['A','B']].fillna(DF2[['A','B']])
But I don't think this would be the best way. Any ideas?
The actual dataset has 3 million lines, so it would be good to get the most effective solution :)
Thanks!! :)
Use np.where
is a good alternative since it works with the underlying numpy arrays:
DF[['A','B']] = np.where(DF[['A','B']].isna(), DF[['D','E']], DF[['A','B']])
Output:
A B C D E
0 0.0 1.0 8.0 2.0 3.0
1 0.0 1.0 8.0 2.0 3.0
2 2.0 3.0 NaN 2.0 3.0
3 0.0 1.0 8.0 NaN NaN
4 NaN NaN NaN NaN NaN
5 0.0 1.0 NaN 2.0 3.0
6 0.0 1.0 8.0 NaN NaN
7 0.0 1.0 8.0 NaN NaN