While preparing an answer to another question here, I coded up a query that contained multiple window functions having the same OVER(...)
clause. Results were as expected.
select ...
sum(sum(s.price)) over (partition by p.productid
order by c.date
rows between 6 preceding and current row)
/ nullif(
sum(count(s.price)) over(partition by p.productid
order by c.date
rows between 6 preceding and current row),
0)
as avg7DayPrice
...
When I attempted to replace the duplicate OVER(...)
clauses with a shared WINDOW
clause reference OVER Last7Days
, defined later as WINDOW Last7Days AS (...)
, the results were unexpectedly different. (See WINDOW
clause - new in 2022.)
select ...
sum(sum(s.price)) over last7days
/ nullif(sum(count(s.price)) over last7days, 0)
as avg7DayPrice
...
window last7days as (partition by p.productid
order by c.date
rows between 6 preceding and current row)
Full original query:
-- Inline window over() clauses
-- Results are as expected
with calendar as (
select min(date) as date, max(date) as endDate
from sales
union all
select dateadd(day, 1, date), endDate
from calendar
where date < enddate
),
products as (
select distinct productid
from sales
)
select
p.productid,
c.date as salesDate,
avg(price) as avg1DayPrice,
sum(sum(s.price)) over (partition by p.productid
order by c.date
rows between 6 preceding and current row)
/ nullif(
sum(count(s.price)) over(partition by p.productid
order by c.date
rows between 6 preceding and current row),
0)
as avg7DayPrice
from calendar c
cross join products p
left join sales s
on s.date = c.date
and s.productid = p.productid
group by p.productid, c.date
Full modified query:
-- Reference to common defined window clause
-- Expect same results, but that is not what I get.
with calendar as (
select min(date) as date, max(date) as endDate
from sales
union all
select dateadd(day, 1, date), endDate
from calendar
where date < enddate
),
products as (
select distinct productid
from sales
)
select
p.productid,
c.date as salesDate,
avg(price) as avg1DayPrice,
sum(sum(s.price)) over last7days
/ nullif(
sum(count(s.price)) over last7days,
0)
as avg7DayPrice
from calendar c
cross join products p
left join sales s
on s.date = c.date
and s.productid = p.productid
group by p.productid, c.date
window last7days as (partition by p.productid
order by c.date
rows between 6 preceding and current row)
Sales table:
productId | date | price |
---|---|---|
1 | 2025-02-01 | 10.00 |
1 | 2025-02-02 | 20.00 |
1 | 2025-02-02 | 30.00 |
1 | 2025-02-03 | 40.00 |
Original results:
productid | salesDate | avg1DayPrice | avg7DayPrice |
---|---|---|---|
1 | 2025-02-01 | 10.000000 | 10.000000 |
1 | 2025-02-02 | 25.000000 | 20.000000 |
1 | 2025-02-03 | 40.000000 | 25.000000 |
Modified results:
productid | salesDate | avg1DayPrice | avg7DayPrice |
---|---|---|---|
1 | 2025-02-01 | 10.000000 | 2.500000 |
1 | 2025-02-02 | 25.000000 | 15.000000 |
1 | 2025-02-03 | 40.000000 | 25.000000 |
See this db<>fiddle.
I do not understand why the results are different. It appears that the sum(count(s.price)) over(...)
part of the calculation is 1, 3, 4
for the first query, but 4, 4, 4
in the second. The execution plan is also significantly different. Can anyone explain this?
Can anyone explain this?
This is certainly a bug (reported here), a simpler repro is
SELECT grp,
val,
sum(val) over wf as input_to_nullif,
nullif(sum(val) over wf, 30) as wrong_result,
IIF(sum(val) over wf = 30, NULL, sum(val) over ()) as faulty_expansion,
IIF(sum(val) over wf = 30, NULL, sum(val) over wf) as correct_expansion
FROM (VALUES
(1, 1),
(1, 3),
(2, 10),
(2, 20),
(2, 20)
) V(grp, val)
window wf as (partition by grp order by val rows unbounded preceding);
which returns
grp | val | input_to_nullif | wrong_result | faulty_expansion | correct_expansion |
---|---|---|---|---|---|
1 | 1 | 1 | 54 | 54 | 1 |
1 | 3 | 4 | 54 | 54 | 4 |
2 | 10 | 10 | 54 | 54 | 10 |
2 | 20 | 30 | NULL | NULL | NULL |
2 | 20 | 50 | 54 | 54 | 50 |
The problem is that NULLIF
gets expanded out to an expression where the first argument appears twice and the window frame is only correctly preserved for the first reference.
i.e. instead of getting expanded out per the correct_expansion
above it gets expanded out to the faulty_expansion
variant.
Running the below...
SELECT
NULLIF(sum(val) over wf,-1) as wrong_result_nullif
FROM (VALUES
(1, 1),
(1, 3)
) V(grp, val)
window wf as (partition by grp order by val)
option (RECOMPILE, querytraceon 3604,querytraceon 8605);
shows this in part of the output (no window frame after the second stopSum
)
AncOp_PrjList
AncOp_PrjEl COL: Expr1006
ScaOp_AggFunc stopSum partition[CALC:COL: Union1004 TI(int,ML=4)] order[CALC:COL: Union1005 TI(int,ML=4) ASC] range between unbounded and current
ScaOp_Identifier COL: Union1005
AncOp_PrjEl COL: Expr1007
ScaOp_AggFunc stopSum
ScaOp_Identifier COL: Union1005