Search code examples
pythonpandasdataframenanmissing-data

Pandas : fill NaN with the closest value, according to a category column


Let's take a sample dataframe :

df = pd.DataFrame({"Date": ["2022-10-01","2022-10-02","2022-10-03","2022-10-04","2022-10-05","2022-10-06","2022-10-01","2022-10-02","2022-10-03","2022-10-04","2022-10-05","2022-10-06"],
                   "Animal" :["Cat","Cat","Cat","Cat","Cat","Cat","Dog","Dog","Dog","Dog","Dog","Dog"],
                   "Quantity":[np.nan,4,3,5,1,np.nan,6,5,np.nan,np.nan,2,1]})

          Date Animal  Quantity
0   2022-10-01    Cat       NaN
1   2022-10-02    Cat       4.0
2   2022-10-03    Cat       3.0
3   2022-10-04    Cat       5.0
4   2022-10-05    Cat       1.0
5   2022-10-06    Cat       NaN
6   2022-10-01    Dog       6.0
7   2022-10-02    Dog       5.0
8   2022-10-03    Dog       NaN
9   2022-10-04    Dog       NaN
10  2022-10-05    Dog       2.0
11  2022-10-06    Dog       1.0

I would like to fill the NaN values in the column Quantity using the following method :

  • Replace the NaN values with the closest value that is before the NaN value and which share the same value in Animal column
  • If there is still some NaN values, replace the remaining NaN values with the closest value that is after the Nan value and which share the same value in Animal column

I thought to Series.interpolate but I don't know how to deal with the Animal column. Would you please know an efficient way to reach the expected output ?

Expected output :

          Date Animal  Quantity
0   2022-10-01    Cat         4
1   2022-10-02    Cat         4
2   2022-10-03    Cat         3
3   2022-10-04    Cat         5
4   2022-10-05    Cat         1
5   2022-10-06    Cat         1
6   2022-10-01    Dog         6
7   2022-10-02    Dog         5
8   2022-10-03    Dog         5
9   2022-10-04    Dog         5
10  2022-10-05    Dog         2
11  2022-10-06    Dog         1
``

Solution

  • You can use ffill/bfill per group:

    df['Quantity'] = (df.groupby('Animal', group_keys=False)['Quantity']
                        .apply(lambda s: s.bfill().ffill())
                     )
    

    Output:

              Date Animal  Quantity
    0   2022-10-01    Cat       4.0
    1   2022-10-02    Cat       4.0
    2   2022-10-03    Cat       3.0
    3   2022-10-04    Cat       5.0
    4   2022-10-05    Cat       1.0
    5   2022-10-06    Cat       1.0
    6   2022-10-01    Dog       6.0
    7   2022-10-02    Dog       5.0
    8   2022-10-03    Dog       5.0
    9   2022-10-04    Dog       5.0
    10  2022-10-05    Dog       2.0
    11  2022-10-06    Dog       1.0