Search code examples
sqlsnowflake-cloud-data-platformwindow-functions

Moving AVG over variable number of rows


In Snowflake, I would like to calculate a moving average for my data, but the number of rows in the average can change. Specifically, I want to factor into the average any records whose date is 30 days or less prior to the date in the current row. But there could be multiple records for each date, or missing dates.

For example:

Date Value
Jan 1 100
Jan 5 200
Jan 20 100
Feb 3 0
Feb 3 500
Feb 10 400
Mar 8 600

I want the output to be:

Date 30d Avg Note
Jan 1 100 (calculated as AVG(100))
Jan 5 150 (calculated as AVG(100, 200))
Jan 20 133 (calculated as AVG(100, 200, 100))
Feb 3 200 (calculated as AVG(200, 100, 0, 500))
Feb 10 250 (calculated as AVG(100, 0, 500, 400))
Mar 8 500 (calculated as AVG(400, 600))

Using window frames doesn't seem to work here, because the number of inputs to the frame can change each time (depending on how many rows are less than 30 days before this one).

How can I do this?


Solution

  • On most databases, this can be done by using the window function AVG with window frame RANGE BETWEEN INTERVAL:

    SELECT Date, MAX(average) AS '30d_avg'
    FROM (
      SELECT Date, AVG(Value) over(ORDER BY Date RANGE BETWEEN INTERVAL '30' DAY PRECEDING AND CURRENT ROW) as average
      FROM mytable
    ) AS s
    GROUP BY Date
    

    Since Snowflake does not supportRANGE BETWEEN INTERVAL this is a workaround :

    SELECT 
       t1.Date,
       AVG(t2.Value) AS 30d_avg
    FROM mytable AS t1
    INNER JOIN mytable AS t2 ON t2.Date >=  DATEADD(DAY, -30, t1.Date)  
                             AND t2.Date <= t1.Date
    GROUP BY t1.Date
    ORDER BY t1.Date
    

    Demo tested on mysql : https://dbfiddle.uk/miO-qNsf