Search code examples
pythonpandasfillna

Fillna with the average 3 values, based on several columns


I have a dataframe like the one below:

month      country  prod    sales   units   cust    u/cust      $/unit
4/1/2022    FRANCE  PROD.A  1715    329     148     2.222973    5.212766
4/1/2022    FRANCE  PROD.B  1708    371     156     2.378205    4.603774
4/1/2022    ITALY   PROD.A  1636    396     176     2.250000    4.131313
4/1/2022    ITALY   PROD.B  1965    308     130     2.369231    6.379870
5/1/2022    FRANCE  PROD.A  1623    305     143     2.132867    5.321311
5/1/2022    FRANCE  PROD.B  1791    398     140     2.842857    4.500000
5/1/2022    ITALY   PROD.A  1753    387     134     2.888060    4.529716
5/1/2022    ITALY   PROD.B  1643    394     138     2.855072    4.170051
6/1/2022    FRANCE  PROD.A  1600    399     149     2.677852    4.010025
6/1/2022    FRANCE  PROD.B  1700    327     132     2.477273    5.198777
6/1/2022    ITALY   PROD.A  1619    362     136     2.661765    4.472376
6/1/2022    ITALY   PROD.B  1871    369     150     2.460000    5.070461
7/1/2022    FRANCE  PROD.A  NaN     355     144     2.465278    NaN
7/1/2022    FRANCE  PROD.B  NaN     320     134     2.388060    NaN
7/1/2022    ITALY   PROD.A  NaN     377     136     2.772059    NaN
7/1/2022    ITALY   PROD.B  NaN     363     135     2.688889    NaN

What I am trying to do is to 'fillna' the '$/unit' column with the average of the last 3 values for that country and that product.

i.e, the first NaN should be the average of the $/unit of PROD.A in FRANCE in April, May and June (in this example, the average of 5.21, 5.32 and 4.01, or 4.84).

Thanks to another question in this site, I could do a 'forward fill', taking the last value for that product in that country:

df = df.groupby(['country', 'prod'], as_index=False).apply(lambda group: group.ffill())

But I want to use the average of the last three values, and I really can't figure it out...

Thanks in advance!


Solution

  • Try this

    # average of the last 3 values for each country-prod pair with same index as df
    filler = df.groupby(['country', 'prod'])['$/unit'].transform(lambda x: x.dropna().tail(3).mean())
    # fill missing values by filler
    df['$/unit'].fillna(filler, inplace=True)
    df
    

    enter image description here