Search code examples
sqlsql-serveraveragedateadd

Dateadd for moving 7 day avg VS (partition by price order by date rows between 6 preceding and current row)


Will the statement

Select productid, avg(price) as avgPrice, date as salesDate
from sales
where date between getdate() and dateadd(day, -7, getdate)
group by date

return the same results as

select productid, 
 avg(price) over (partition by productid order by date rows between 6 preceding and current row) 
 as avgPrice,
 date as salesDate
from sales
group by productid, date

Solution

  • The expression avg(price) over(...) is not valid in your original post, because a window function must reference a value that would otherwise be valid in the select list. Since price is not part of the group by, it is not valid in that context.

    The expression avg(avg(price)) over(...) would be valid (since avg(price) is an allowed select list item), but that would not give the correct result because the average of averages of subsets is not the same as the average of the whole set.

    Something like sum(sum(price)) over(...) / sum(count(*)) over(...) might work, but if there are gaps in sale dates, your rows range might reach back more than than the intended 7 days. You would need incorporate a calendar table or date generator into you solution to fill in any missing dates for each product.

    The following will generate a range of dates, cross join them with a distinct list of product IDs, and then left join that combination with the sales data. The combined data is then grouped and aggregated to produce the desired results.

    -- Correct results are obtained by using a calendar table to fill in any date gaps.
    with calendar as (
        select min(date) as date, max(date) as endDate
        from sales
        union all
        select dateadd(day, 1, date), endDate
        from calendar
        where date < enddate
    ),
    products as (
        select distinct productid
        from sales
    )
    select
        p.productid,
        c.date as salesDate, 
        count(*) as numSales,
        avg(price) as avg1DayPrice,
        sum(sum(s.price)) over (partition by p.productid
                                order by c.date
                                rows between 6 preceding and current row)
            / sum(count(s.price)) over(partition by p.productid
                                       order by c.date
                                       rows between 6 preceding and current row)
            as avg7DayPrice
    from calendar c
    cross join products p
    left join sales s
        on s.date = c.date
        and s.productid = p.productid
    group by p.productid, c.date
    order by p.productid, c.date
    

    If you are using SQL Server 2022 (or later), you can move the repeated OVER specifications to a named WINDOW specification.

    select ...
        sum(sum(s.price)) over last7days
            / sum(count(s.price)) over last7days
            as avg7DayPrice
    ...
    window last7days as (partition by p.productid
                         order by c.date
                         rows between 6 preceding and current row)
    ...
    

    An altogether different approach is to use a subquery instead of the window functions to calculate the 7-day average.

    select
        s.productid,
        s.date as salesDate,
        count(*) as numSales,
        avg(s.price) as avg1DayPrice,
        (
            select avg(s2.price)
            from sales s2
            where s2.productid = s.productid
            and s2.date between dateadd(day, -6, s.date) and s.date
        ) as avg7DayPrice
    from sales s
    group by s.productid, s.date
    order by s.productid, s.date
    

    See this db<>fiddle for a demo of the various techniques discussed, with some simple test data.