Search code examples
pythonpandaspandas-groupbyfillna

python conditional grouped ffill


In [179]: test = pd.DataFrame({'A': [1,1,1,2,2,2,2,3,3,4,4], 'B': [-999, np.nan, np.nan, 3
     ...: , np.nan, -999, np.nan, 5,-999,np.nan, 3]})                                     

In [180]: test                                                                            
Out[180]: 
    A      B
0   1 -999.0
1   1    NaN
2   1    NaN
3   2    3.0
4   2    NaN
5   2 -999.0
6   2    NaN
7   3    5.0
8   3 -999.0
9   4    NaN
10  4    3.0

I want to fill nan values in column B if the precedent value is -999 grouped by column A

desired result:

Out[181]: 
    A      B
0   1 -999.0
1   1 -999.0
2   1 -999.0
3   2    3.0
4   2    NaN
5   2 -999.0
6   2 -999.0
7   3    5.0
8   3 -999.0
9   4    NaN
10  4    3.0

I used an iterrows to check each row but ... my real df has 1M lines

Thank you


Solution

  • fillvals = df.groupby('A').ffill()
    df['B'].fillna(fillvals[fillvals.eq(-999)]['B'], inplace=True)
    

    Result:

        A      B
    0   1 -999.0
    1   1 -999.0
    2   1 -999.0
    3   2    3.0
    4   2    NaN
    5   2 -999.0
    6   2 -999.0
    7   3    5.0
    8   3 -999.0
    9   4    NaN
    10  4    3.0
    

    df.groupby('A').ffill() will fill any NaN value with the previous values, keep a copy of that frame.

    fillvals[fillvals.eq(-999)]['B'] to filter the indices that are only-999`, i.e., the preceding value.

    Use fillna to fill these -999 from the filtered frame.