Search code examples
pythonpandasfillna

How to populate NaN by 0, starting after first non-nan value


I need to populate NaN values of my df by a static 0, starting from the first non-nan value.

In a way, combining method="ffill" (identify the first value per column, and only act on following NaN values) with value=0 (populating by 0, not the variable quantity in df).

How can I do that? This post is close, but not it: How to replace NaNs by preceding or next values in pandas DataFrame?

Example df

     0    1    2
0  NaN  NaN  NaN
1  6.0  NaN  1.0
2  NaN  3.0  NaN
3  NaN  NaN  4.0

Desired output:

     0    1    2
0  NaN  NaN  NaN
1  6.0  NaN  1.0
2  0.0  3.0  0.0
3  0.0  0.0  4.0

If possible, df.fillna(value=0, method='ffill') would be great. But that returns ValueError: Cannot specify both 'value' and 'method'.

Edit: Oh, and time matters. We are talking ~60M rows and 4k columns - so looping is out of the question, and masking only if really, really fast


Solution

  • You can try mask(), ffill() and fillna():

    df=df.fillna(df.mask(df.ffill().notna(),0))
    #OR via where
    df=df.fillna(df.where(df.ffill().isna(),0))
    

    output:

        0       1       2
    0   NaN     NaN     NaN
    1   6.0     NaN     1.0
    2   0.0     3.0     4.0
    3   0.0     0.0     0.0