Search code examples
sqlamazon-redshiftwindow-functions

Redshift SQL Window Function frame_clause with days


I am trying to perform a window function on a data-set in Redshift using days an an interval for the preceding rows. Example data:

date        ID      score
3/1/2017    123     1
3/1/2017    555     1
3/2/2017    123     1
3/3/2017    555     3
3/5/2017    555     2

SQL window function for avg score from the last 3 scores:

select
      date,
      id,
      avg(score) over 
         (partition by id order by date rows 
              between preceding 3 and 
                      current row) LAST_3_SCORES_AVG,
from DATASET

Result:

date        ID      LAST_3_SCORES_AVG
3/1/2017    123     1
3/1/2017    555     1
3/2/2017    123     1
3/3/2017    555     2
3/5/2017    555     2

Problem is that I would like the average score from the last 3 DAYS (moving average) and not the last three tests. I have gone over the Redshift and Postgre Documentation and can't seem to find any way of doing it.

Desired Result:

date        ID      3_DAY_AVG
3/1/2017    123     1
3/1/2017    555     1
3/2/2017    123     1
3/3/2017    555     2
3/5/2017    555     2.5

Any direction would be appreciated.


Solution

  • You can use lag() and explicitly calculate the average.

    select t.*,
           (score +
            (case when lag(date, 1) over (partition by id order by date) >=
                       date - interval '2 day'
                  then lag(score, 1) over (partition by id order by date)
                  else 0
             end) +
            (case when lag(date, 2) over (partition by id order by date) >=
                       date - interval '2 day'
                  then lag(score, 2) over (partition by id order by date)
                  else 0
             end)
            )
           ) /
           (1 +
            (case when lag(date, 1) over (partition by id order by date) >=
                       date - interval '2 day'
                  then 1
                  else 0
             end) +
            (case when lag(date, 2) over (partition by id order by date) >=
                       date - interval '2 day'
                  then 1
                  else 0
             end)
           )
    from dataset t;