In Snowflake, I would like to calculate a moving average for my data, but the number of rows in the average can change. Specifically, I want to factor into the average any records whose date is 30 days or less prior to the date in the current row. But there could be multiple records for each date, or missing dates.
For example:
Date | Value |
---|---|
Jan 1 | 100 |
Jan 5 | 200 |
Jan 20 | 100 |
Feb 3 | 0 |
Feb 3 | 500 |
Feb 10 | 400 |
Mar 8 | 600 |
I want the output to be:
Date | 30d Avg | Note |
---|---|---|
Jan 1 | 100 | (calculated as AVG(100) ) |
Jan 5 | 150 | (calculated as AVG(100, 200) ) |
Jan 20 | 133 | (calculated as AVG(100, 200, 100) ) |
Feb 3 | 200 | (calculated as AVG(200, 100, 0, 500) ) |
Feb 10 | 250 | (calculated as AVG(100, 0, 500, 400) ) |
Mar 8 | 500 | (calculated as AVG(400, 600) ) |
Using window frames doesn't seem to work here, because the number of inputs to the frame can change each time (depending on how many rows are less than 30 days before this one).
How can I do this?
On most databases, this can be done by using the window function AVG
with window frame RANGE BETWEEN INTERVAL
:
SELECT Date, MAX(average) AS '30d_avg'
FROM (
SELECT Date, AVG(Value) over(ORDER BY Date RANGE BETWEEN INTERVAL '30' DAY PRECEDING AND CURRENT ROW) as average
FROM mytable
) AS s
GROUP BY Date
Since Snowflake does not supportRANGE BETWEEN INTERVAL
this is a workaround :
SELECT
t1.Date,
AVG(t2.Value) AS 30d_avg
FROM mytable AS t1
INNER JOIN mytable AS t2 ON t2.Date >= DATEADD(DAY, -30, t1.Date)
AND t2.Date <= t1.Date
GROUP BY t1.Date
ORDER BY t1.Date
Demo tested on mysql : https://dbfiddle.uk/miO-qNsf