Search code examples
pandasgroup-by

Why does .bfill().ffill() act differently than ffill().bfill() on groups?


I think I'm missing something basic conceptually, but I'm not able to find the answer in the docs.

>>> df=pd.DataFrame({'a':[1,1,2,2,3,3], 'b':[5,np.nan, 6, np.nan, np.nan, np.nan]})
>>> df
   a    b
0  1  5.0
1  1  NaN
2  2  6.0
3  2  NaN
4  3  NaN
5  3  NaN

Using ffill() and then bfill():

>>> df.groupby('a')['b'].ffill().bfill()
0    5.0
1    5.0
2    6.0
3    6.0
4    NaN
5    NaN

Using bfill() and then ffill():

>>> df.groupby('a')['b'].bfill().ffill()
0    5.0
1    5.0
2    6.0
3    6.0
4    6.0
5    6.0

Doesn't the second way break the groupings? Will the first way always make sure that the values are filled in only with other values in that group?


Solution

  • I think you need:

    print (df.groupby('a')['b'].apply(lambda x: x.ffill().bfill()))
    0    5.0
    1    5.0
    2    6.0
    3    6.0
    4    NaN
    5    NaN
    Name: b, dtype: float64
    
    print (df.groupby('a')['b'].apply(lambda x: x.bfill().ffill()))
    0    5.0
    1    5.0
    2    6.0
    3    6.0
    4    NaN
    5    NaN
    Name: b, dtype: float64
    

    because in your sample only first ffill or bfill is DataFrameGroupBy.ffill or DataFrameGroupBy.bfill, second is working with output Series. So it break groups, because Series has no groups.

    print (df.groupby('a')['b'].ffill())
    0    5.0
    1    5.0
    2    6.0
    3    6.0
    4    NaN
    5    NaN
    Name: b, dtype: float64
    
    print (df.groupby('a')['b'].bfill())
    0    5.0
    1    NaN
    2    6.0
    3    NaN
    4    NaN
    5    NaN
    Name: b, dtype: float64