Search code examples
pythonpandasrolling-computation

Pandas: Rolling mean using only the last update based on another column


I would like to perform a rolling mean while the mean excludes duplicates found in another column. Let me provide an example dataframe:

Date            Warehose       Value
10-01-1998      London          10
13-01-1998      London          13
15-01-1998      New York        37
12-02-1998      London          21
20-02-1998      New York        39
21-02-1998      New York        17

In this example, let's say I like to perform 30-day rolling mean of Value but taking into account only the last update of the Warehouse location. The resulting dataframe is expected to be:

 Date         Value     Rolling_Mean
02-01-1998      10           10
05-01-1998      13           13
15-01-1998      37           20
12-02-1998      21           29           
20-02-1998      39           30 
21-02-1998      17           19

The data I have is relatively big so as efficient as possible is appreciated.


Solution

  • It's a bit tricky. As rolling.apply works on Series only and you need both "Wharehose" and "Value" to perform the computation, you need to access the complete dataframe using a function (and a "global" variable, which is not super clean IMO):

    df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
    df2 = df.set_index('Date')
    
    def agg(s):
        return (df2.loc[s.index]
                   .drop_duplicates(subset='Warehose', keep='last')
                   ['Value'].mean()
               )
    
    df['Rolling_Mean'] = (df.sort_values(by='Date')
                            .rolling('30d', on='Date')
                            ['Value']
                            .apply(agg, raw=False)
                          )
    

    output:

            Date  Warehose  Value  Rolling_Mean
    0 1998-01-10    London     10          10.0
    1 1998-01-13    London     13          13.0
    2 1998-01-15  New York     37          25.0
    3 1998-02-12    London     21          29.0
    4 1998-02-20  New York     39          30.0
    5 1998-02-21  New York     17          19.0