Search code examples
pythonpandasdataframedata-cleaningdata-conversion

If date of year 2020 have booking value less than 25 than replace it with last year 7 days value average in pandas


I am working on a pandas dataframe and I have to replace column booking value if less than 25 with average of 7 days example x = 22-03-2020 has booking value 3, I have to replace it

x = average(before 3 days + last year same date + after 3 days/7))

enter image description here

After replace

enter image description here

I have many values to replace, manually i am able to do it but i need some shortcut.


Solution

  • You can use rolling with center=True, but need all values of days, so add Series.asfreq, then add one year to DatetimeIndex and because leap year is 2020 is added mean for aggregate duplicates in end of February, last set values by mask:

    df = pd.DataFrame({'date':pd.date_range('2019-03-19', periods=8),
                     'Booking':[92,109,144,109,122,76,78, 3]}) 
    df.loc[7, 'date'] = pd.to_datetime('2020-03-22')
    
    df = df.set_index('date')
    
    s = df['Booking'].asfreq('d').rolling(7, center=True).mean()
    s.index = s.index + pd.offsets.DateOffset(years=1)
    s = s.mean(level=0)
    print (s)
    date
    2020-03-19           NaN
    2020-03-20           NaN
    2020-03-21           NaN
    2020-03-22    104.285714
    2020-03-23           NaN
       
    2021-03-18           NaN
    2021-03-19           NaN
    2021-03-20           NaN
    2021-03-21           NaN
    2021-03-22           NaN
    Name: Booking, Length: 369, dtype: float64
    

    mask = df['Booking'].lt(25)
    
    df.loc[mask, 'Booking'] = s
    print (df)
                   Booking
    date                  
    2019-03-19   92.000000
    2019-03-20  109.000000
    2019-03-21  144.000000
    2019-03-22  109.000000
    2019-03-23  122.000000
    2019-03-24   76.000000
    2019-03-25   78.000000
    2020-03-22  104.285714