Search code examples
pythonpandasnanmissing-datadata-cleaning

Fill in missing pandas data with previous non-missing value, grouped by key


I am dealing with pandas DataFrames like this:

   id    x
0   1   10
1   1   20
2   2  100
3   2  200
4   1  NaN
5   2  NaN
6   1  300
7   1  NaN

I would like to replace each NAN 'x' with the previous non-NAN 'x' from a row with the same 'id' value:

   id    x
0   1   10
1   1   20
2   2  100
3   2  200
4   1   20
5   2  200
6   1  300
7   1  300

Is there some slick way to do this without manually looping over rows?


Solution

  • You could perform a groupby/forward-fill operation on each group:

    import numpy as np
    import pandas as pd
    
    df = pd.DataFrame({'id': [1,1,2,2,1,2,1,1], 'x':[10,20,100,200,np.nan,np.nan,300,np.nan]})
    df['x'] = df.groupby(['id'])['x'].ffill()
    print(df)
    

    yields

       id      x
    0   1   10.0
    1   1   20.0
    2   2  100.0
    3   2  200.0
    4   1   20.0
    5   2  200.0
    6   1  300.0
    7   1  300.0