I need to populate NaN
values of my df
by a static 0
, starting from the first non-nan value.
In a way, combining method="ffill"
(identify the first value per column, and only act on following NaN
values) with value=0
(populating by 0, not the variable quantity in df).
How can I do that? This post is close, but not it: How to replace NaNs by preceding or next values in pandas DataFrame?
Example df
0 1 2
0 NaN NaN NaN
1 6.0 NaN 1.0
2 NaN 3.0 NaN
3 NaN NaN 4.0
Desired output:
0 1 2
0 NaN NaN NaN
1 6.0 NaN 1.0
2 0.0 3.0 0.0
3 0.0 0.0 4.0
If possible, df.fillna(value=0, method='ffill')
would be great. But that returns ValueError: Cannot specify both 'value' and 'method'.
Edit: Oh, and time matters. We are talking ~60M rows and 4k columns - so looping is out of the question, and masking only if really, really fast
You can try mask()
, ffill()
and fillna()
:
df=df.fillna(df.mask(df.ffill().notna(),0))
#OR via where
df=df.fillna(df.where(df.ffill().isna(),0))
output:
0 1 2
0 NaN NaN NaN
1 6.0 NaN 1.0
2 0.0 3.0 4.0
3 0.0 0.0 0.0