Search code examples
amazon-redshiftpartitioningrolling-average

Rolling N monthly average in Redshift with multiple entries per month


I would like to use Redshift's window aggregation functions to create an 'N' month rolling average of some data. The data will have multiple unique entries per any given month. If possible, I'd like to avoid first grouping by and averaging within months before performing rolling average as this is taking an average of an average and not ideal (as this post does: 3 Month Moving Average - Redshift SQL).

This is a sample dataset of just one account (there will be more than 1).

Quote Date      Account.   Value 
3/24/2015       acme.       3
3/25/2015       acme.       7     
4/1/2015        acme.       12          
4/3/2015        acme.       17         
5/15/2015       acme.       1         
6/30/2015       acme.       3
7/30/2015       acme.       9

And this is what I would like the result to look like for a 3 month rolling average (for an example).

Quote_Date      Account.   Value  Month 3M_Rolling_Average
3/24/2015       acme.       3        1       3
3/25/2015       acme.       7        1       5
4/1/2015        acme.       12       2       7.33
4/3/2015        acme.       17       2       9.75
5/15/2015       acme.       1        3       8
6/30/2015       acme.       3        4       8.25
7/30/2015       acme.       9        5       4.33

The code I have tried looks like this:

avg(Value) over (partition by Account order by Quote Date rows between 2 preceding and current row)

But, this only operates over the last 2 rows (and including current row) which would work if I had one unique value for each month but as stated, this is not the case. I am open to any kind of ranking solution or nested partitioning. Any help is greatly appreciated.


Solution

  • Since an average is just the sum() / count(), you just need to group by month but get the sum() and count(). Then use your lag to sum 3 months of sums and divide by the sum of 3 months of counts. You are correct that average of averages is not correct but if you carry the sums and counts things work.