I am migrating some legacy code from an Oracle DB to an SQL Server DB. I have come along this piece of code, with the purpose of calculating the cumulative amount over the last 6 months/ year, for each record/group:
select
a,b,c,d,e,trunc(record_date, 'MM'),
sum(sum(amount)) over(
partition by a, b, c, d
order by trunc(record_date, 'MM')
range between interval 5 month preceding and current row) semester_amount,
sum(sum(amount)) over(
partition by a, b, c, d
order by trunc(record_date, 'MM')
range between interval '11' month preceding and current row) year_amount
from table
group by
a,b,c,d,e,trunc(record_date, 'MM')
My goal is to implement it in SQL Server where the date range in window partitions isn't available.
Here is a sample table I use for experimentation:
drop table if exists dbo.test_window_frames;
create table dbo.test_window_frames (
fielda char(1),
fieldb char(1),
fieldc char(2),
fieldd char(2),
fielde char(2),
amount int,
record_date date);
insert into dbo.test_window_frames (fielda, fieldb, fieldc, fieldd, fielde, amount, record_date)
values
('A', 'X', 'C1', 'D2', 'E2', 74, '2024-03-10'),
('A', 'X', 'C1', 'D2', 'E2', 34, '2024-03-12'),
('A', 'X', 'C1', 'D2', 'E2', 24, '2024-03-12'),
('A', 'X', 'C1', 'D2', 'E1', 24, '2024-03-25'),
('A', 'X', 'C1', 'D2', 'E1', 95, '2024-06-20'),
('A', 'X', 'C1', 'D2', 'E1', 81, '2024-06-27'),
('A', 'X', 'C1', 'D2', 'E1', 83, '2024-07-10'),
('A', 'X', 'C1', 'D2', 'E2', 73, '2024-07-27'),
('A', 'X', 'C1', 'D2', 'E2', 60, '2024-10-07'),
('A', 'X', 'C1', 'D2', 'E2', 50, '2024-10-07'),
('A', 'X', 'C2', 'D1', 'E1', 67, '2024-04-02'),
('A', 'X', 'C2', 'D1', 'E1', 69, '2024-12-04'),
('B', 'X', 'C1', 'D1', 'E1', 50, '2024-01-15'),
('B', 'X', 'C1', 'D1', 'E2', 30, '2024-02-20'),
('B', 'X', 'C2', 'D2', 'E1', 80, '2024-03-22'),
('B', 'X', 'C2', 'D2', 'E2', 90, '2024-05-10'),
('B', 'X', 'C1', 'D1', 'E1', 55, '2024-06-01'),
('B', 'X', 'C2', 'D2', 'E2', 20, '2024-07-19'),
('B', 'X', 'C1', 'D1', 'E2', 40, '2024-08-14'),
('B', 'X', 'C1', 'D1', 'E2', 5, '2024-08-14'),
('B', 'X', 'C1', 'D2', 'E1', 65, '2024-09-25'),
('B', 'X', 'C2', 'D1', 'E2', 35, '2024-10-12'),
('B', 'X', 'C2', 'D2', 'E1', 75, '2024-11-05');
The desired result is for each group (for example ('A', 'X', 'C1', 'D2', 'E2', '2024-03-01') the date is truncated) to have a row with the amount of the past 6 months and the past year since that group's date. (132 for both in this case)
Until now I tried my luck with subqueries:
select
t1.fielda,
t1.fieldb,
t1.fieldc,
t1.fieldd,
t1.fielde,
dateadd(month, datediff(month, 0, t1.record_date), 0) as reference_date,
(
select sum(t2.amount)
from dbo.test_window_frames t2
where t2.fielda = t1.fielda
and t2.fieldb = t1.fieldb
and t2.fieldc = t1.fieldc
and t2.fieldd = t1.fieldd
and t2.record_date between dateadd(month, -5, t1.record_date) and t1.record_date
) as semester_amount,
(
select sum(t2.amount)
from dbo.test_window_frames t2
where t2.fielda = t1.fielda
and t2.fieldb = t1.fieldb
and t2.fieldc = t1.fieldc
and t2.fieldd = t1.fieldd
and t2.record_date between dateadd(month, -11, t1.record_date) and t1.record_date
) as year_amount
from
dbo.test_window_frames t1
group by
t1.fielda, t1.fieldb, t1.fieldc, t1.fieldd, t1.fielde, dateadd(month, datediff(month, 0, t1.record_date), 0);
But I get the following error:
Column 'dbo.test_window_frames.record_date' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Can you help me?
I attach the expected results that I believe are the output of the Oracle code, for the fieldA = 'A'.
fielda | fieldb | fieldc | fieldd | fielde | reference_date | semester_amount | year_amount |
---|---|---|---|---|---|---|---|
A | X | C1 | D2 | E2 | 2024-03-01 | 132 | 132 |
A | X | C1 | D2 | E1 | 2024-03-01 | 24 | 24 |
A | X | C1 | D2 | E1 | 2024-06-01 | 200 | 200 |
A | X | C1 | D2 | E1 | 2024-07-01 | 283 | 283 |
A | X | C1 | D2 | E2 | 2024-07-01 | 205 | 205 |
A | X | C1 | D2 | E2 | 2024-10-01 | 183 | 315 |
A | X | C2 | D1 | E1 | 2024-04-01 | 67 | 67 |
A | X | C2 | D1 | E1 | 2024-12-01 | 69 | 136 |
The following should produce the same results as your original Oracle query:
select
t1.fielda,
t1.fieldb,
t1.fieldc,
t1.fieldd,
t1.fielde,
rd.reference_date,
sum(t1.amount) as current_amount, -- *** Added for reference ***
(
select sum(t2.amount)
from dbo.test_window_frames t2
where t2.fielda = t1.fielda
and t2.fieldb = t1.fieldb
and t2.fieldc = t1.fieldc
and t2.fieldd = t1.fieldd
--and t2.fielde = t1.fielde -- *** ??? ***
and t2.record_date >= dateadd(month, -5, rd.reference_date) -- *** Modified ***
and t2.record_date < dateadd(month, 1, rd.reference_date) -- *** Modified ***
) as semester_amount,
(
select sum(t2.amount)
from dbo.test_window_frames t2
where t2.fielda = t1.fielda
and t2.fieldb = t1.fieldb
and t2.fieldc = t1.fieldc
and t2.fieldd = t1.fieldd
--and t2.fielde = t1.fielde -- *** ??? ***
and t2.record_date >= dateadd(month, -11, rd.reference_date) -- *** Modified ***
and t2.record_date < dateadd(month, 1, rd.reference_date) -- *** Modified ***
) as year_amount
from
dbo.test_window_frames t1
cross apply ( -- *** Added ***
select datetrunc(month, t1.record_date) as reference_date
--select dateadd(month, datediff(month, 0, t1.record_date), 0) as reference_date
) rd
group by
t1.fielda, t1.fieldb, t1.fieldc, t1.fieldd, t1.fielde, rd.reference_date
order by
t1.fielda, t1.fieldb, t1.fieldc, t1.fieldd, rd.reference_date, t1.fielde;
The first thing I did was to move the reference_date
calculation to a CROSS APPLY
so that it can be defined once and conveniently referenced several times elsewhere. For SQL Server 2022 and later, you can use the DATETRUNC()
function. For earlier versions, the dateadd(xxx, datediff(xxx, 0, ...), 0)
hack can still be used.
I also changed the date range in the subqueries to be half-open intervals >= current month - N
and < current month + 1
. This cleanly includes all dates through the end of the current month.
Lastly, I added the and t2.fielde = t1.fielde
condition to the subqueries, as this was needed to match your desired results. However, I later commented out this condition, as it does not match the original Oracle query.
Because this query performs multiple range-lookups in subselects (instead of the original Oracle window functions), it is essential that those lookups be supported by an appropriate index such as test_window_frames(fielda, fieldb, fieldc, fieldd, record_date) INCLUDE(amount)
. (The INCLUDE
part saves a bunch of clustered index seeks.)
Results:
fielda | fieldb | fieldc | fieldd | fielde | reference date |
current amount |
semester amount |
year amount |
---|---|---|---|---|---|---|---|---|
A | X | C1 | D2 | E1 | 2024-03-01 | 24 | 156 | 156 |
A | X | C1 | D2 | E2 | 2024-03-01 | 132 | 156 | 156 |
A | X | C1 | D2 | E1 | 2024-06-01 | 176 | 332 | 332 |
A | X | C1 | D2 | E1 | 2024-07-01 | 83 | 488 | 488 |
A | X | C1 | D2 | E2 | 2024-07-01 | 73 | 488 | 488 |
A | X | C1 | D2 | E2 | 2024-10-01 | 110 | 442 | 598 |
A | X | C2 | D1 | E1 | 2024-04-01 | 67 | 67 | 67 |
A | X | C2 | D1 | E1 | 2024-12-01 | 69 | 69 | 136 |
B | X | C1 | D1 | E1 | 2024-01-01 | 50 | 50 | 50 |
B | X | C1 | D1 | E2 | 2024-02-01 | 30 | 80 | 80 |
B | X | C1 | D1 | E1 | 2024-06-01 | 55 | 135 | 135 |
B | X | C1 | D1 | E2 | 2024-08-01 | 45 | 100 | 180 |
B | X | C1 | D2 | E1 | 2024-09-01 | 65 | 65 | 65 |
B | X | C2 | D1 | E2 | 2024-10-01 | 35 | 35 | 35 |
B | X | C2 | D2 | E1 | 2024-03-01 | 80 | 80 | 80 |
B | X | C2 | D2 | E2 | 2024-05-01 | 90 | 170 | 170 |
B | X | C2 | D2 | E2 | 2024-07-01 | 20 | 190 | 190 |
B | X | C2 | D2 | E1 | 2024-11-01 | 75 | 95 | 265 |
Note that the above does not match your posted results, but should match the original Oracle results. (See fiddle links below.)
If you uncomment the and t2.fielde = t1.fielde
condition, the results will match your posted desired results, but not the original Oracle results.
fielda | fieldb | fieldc | fieldd | fielde | reference date |
current amount |
semester amount |
year amount |
---|---|---|---|---|---|---|---|---|
A | X | C1 | D2 | E1 | 2024-03-01 | 24 | 24 | 24 |
A | X | C1 | D2 | E2 | 2024-03-01 | 132 | 132 | 132 |
A | X | C1 | D2 | E1 | 2024-06-01 | 176 | 200 | 200 |
A | X | C1 | D2 | E1 | 2024-07-01 | 83 | 283 | 283 |
A | X | C1 | D2 | E2 | 2024-07-01 | 73 | 205 | 205 |
A | X | C1 | D2 | E2 | 2024-10-01 | 110 | 183 | 315 |
A | X | C2 | D1 | E1 | 2024-04-01 | 67 | 67 | 67 |
A | X | C2 | D1 | E1 | 2024-12-01 | 69 | 69 | 136 |
... | ... | ... | ... | ... | ... | ... | ... | ... |
A slightly different approach is to group and calculate the monthly summaries in a CTE. The semester_amount
and year_amount
lookups are then performed against the CTE instead of the original table as part of the final query. It is even possible to combine the two sum calculations by retrieving the prior 12-months of data once and using a variation of conditional aggregation to calculate two different sums.
with monthly_sums as (
select
t1.fielda,
t1.fieldb,
t1.fieldc,
t1.fieldd,
t1.fielde,
rd.reference_date,
sum(t1.amount) as current_amount
from
dbo.test_window_frames t1
cross apply (
select datetrunc(month, t1.record_date) as reference_date
) rd
group by
t1.fielda, t1.fieldb, t1.fieldc, t1.fieldd, t1.fielde, rd.reference_date
)
select
ms.fielda,
ms.fieldb,
ms.fieldc,
ms.fieldd,
ms.fielde,
ms.reference_date,
a.semester_amount,
a.year_amount
from
monthly_sums ms
cross apply (
select
sum(ms2.current_amount * x.is_semester) as semester_amount,
sum(ms2.current_amount) as year_amount
from monthly_sums ms2
cross apply (
select case when ms2.reference_date >= dateadd(month, -5, ms.reference_date)
then 1 else 0 end as is_semester
) x
where ms2.fielda = ms.fielda
and ms2.fieldb = ms.fieldb
and ms2.fieldc = ms.fieldc
and ms2.fieldd = ms.fieldd
and ms2.reference_date >= dateadd(month, -11, ms.reference_date)
and ms2.reference_date <= ms.reference_date -- Inclusive
) a
order by
ms.fielda, ms.fieldb, ms.fieldc, ms.fieldd, ms.reference_date, ms.fielde;
if you later add date-range filters, take care to include the extra 12 months of prior data in the CTE calculations to support the later semester_amount
and year_amount
calculations. The final query would include a filter to exclude the earlier data.
You should run performance tests on the various queries to determine what works best for your data. Neither is likely to be as performant as one that used a values-between window range, if only SQL Server would support that.
See this db<>fiddle for a demo of the above. For comparison, this db<>fiddle shows the results of the original Oracle query against the same posted sample data.