Search code examples
sql-serverdatewindowaggregate-functions

Implementing range between dates in a window


I am migrating some legacy code from an Oracle DB to an SQL Server DB. I have come along this piece of code, with the purpose of calculating the cumulative amount over the last 6 months/ year, for each record/group:

select
     a,b,c,d,e,trunc(record_date, 'MM'),
        sum(sum(amount)) over(
            partition by a, b, c, d 
            order by trunc(record_date, 'MM')
            range between interval 5 month preceding and current row) semester_amount,
        sum(sum(amount)) over(
            partition by a, b, c, d
            order by trunc(record_date, 'MM')
            range between interval '11' month preceding and current row) year_amount
from table
group by
      a,b,c,d,e,trunc(record_date, 'MM')

My goal is to implement it in SQL Server where the date range in window partitions isn't available.

Here is a sample table I use for experimentation:

drop table if exists dbo.test_window_frames;
create table dbo.test_window_frames (
    fielda char(1),
    fieldb char(1),
    fieldc char(2),
    fieldd char(2),
    fielde char(2),
    amount int,
    record_date date);
insert into dbo.test_window_frames (fielda, fieldb, fieldc, fieldd, fielde, amount, record_date)
values
    ('A', 'X', 'C1', 'D2', 'E2', 74, '2024-03-10'),
    ('A', 'X', 'C1', 'D2', 'E2', 34, '2024-03-12'),
    ('A', 'X', 'C1', 'D2', 'E2', 24, '2024-03-12'),
    ('A', 'X', 'C1', 'D2', 'E1', 24, '2024-03-25'),
    ('A', 'X', 'C1', 'D2', 'E1', 95, '2024-06-20'),
    ('A', 'X', 'C1', 'D2', 'E1', 81, '2024-06-27'),
    ('A', 'X', 'C1', 'D2', 'E1', 83, '2024-07-10'),
    ('A', 'X', 'C1', 'D2', 'E2', 73, '2024-07-27'),
    ('A', 'X', 'C1', 'D2', 'E2', 60, '2024-10-07'),
    ('A', 'X', 'C1', 'D2', 'E2', 50, '2024-10-07'),
    ('A', 'X', 'C2', 'D1', 'E1', 67, '2024-04-02'),
    ('A', 'X', 'C2', 'D1', 'E1', 69, '2024-12-04'),
    ('B', 'X', 'C1', 'D1', 'E1', 50, '2024-01-15'),
    ('B', 'X', 'C1', 'D1', 'E2', 30, '2024-02-20'),
    ('B', 'X', 'C2', 'D2', 'E1', 80, '2024-03-22'),
    ('B', 'X', 'C2', 'D2', 'E2', 90, '2024-05-10'),
    ('B', 'X', 'C1', 'D1', 'E1', 55, '2024-06-01'),
    ('B', 'X', 'C2', 'D2', 'E2', 20, '2024-07-19'),
    ('B', 'X', 'C1', 'D1', 'E2', 40, '2024-08-14'),
    ('B', 'X', 'C1', 'D1', 'E2', 5, '2024-08-14'),
    ('B', 'X', 'C1', 'D2', 'E1', 65, '2024-09-25'),
    ('B', 'X', 'C2', 'D1', 'E2', 35, '2024-10-12'),
    ('B', 'X', 'C2', 'D2', 'E1', 75, '2024-11-05');

The desired result is for each group (for example ('A', 'X', 'C1', 'D2', 'E2', '2024-03-01') the date is truncated) to have a row with the amount of the past 6 months and the past year since that group's date. (132 for both in this case)

Until now I tried my luck with subqueries:

select 
    t1.fielda,
    t1.fieldb,
    t1.fieldc,
    t1.fieldd,
    t1.fielde,
    dateadd(month, datediff(month, 0, t1.record_date), 0) as reference_date,
    (
        select sum(t2.amount)
        from dbo.test_window_frames t2
        where t2.fielda = t1.fielda
          and t2.fieldb = t1.fieldb
          and t2.fieldc = t1.fieldc
          and t2.fieldd = t1.fieldd
          and t2.record_date between dateadd(month, -5, t1.record_date) and t1.record_date
    ) as semester_amount,
    (
        select sum(t2.amount)
        from dbo.test_window_frames t2
        where t2.fielda = t1.fielda
          and t2.fieldb = t1.fieldb
          and t2.fieldc = t1.fieldc
          and t2.fieldd = t1.fieldd
          and t2.record_date between dateadd(month, -11, t1.record_date) and t1.record_date
    ) as year_amount
from 
    dbo.test_window_frames t1
group by 
    t1.fielda, t1.fieldb, t1.fieldc, t1.fieldd, t1.fielde, dateadd(month, datediff(month, 0, t1.record_date), 0);

But I get the following error:

Column 'dbo.test_window_frames.record_date' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

Can you help me?

I attach the expected results that I believe are the output of the Oracle code, for the fieldA = 'A'.

fielda fieldb fieldc fieldd fielde reference_date semester_amount year_amount
A X C1 D2 E2 2024-03-01 132 132
A X C1 D2 E1 2024-03-01 24 24
A X C1 D2 E1 2024-06-01 200 200
A X C1 D2 E1 2024-07-01 283 283
A X C1 D2 E2 2024-07-01 205 205
A X C1 D2 E2 2024-10-01 183 315
A X C2 D1 E1 2024-04-01 67 67
A X C2 D1 E1 2024-12-01 69 136

Solution

  • The following should produce the same results as your original Oracle query:

    select 
        t1.fielda,
        t1.fieldb,
        t1.fieldc,
        t1.fieldd,
        t1.fielde,
        rd.reference_date,
        sum(t1.amount) as current_amount, -- *** Added for reference ***
        (
            select sum(t2.amount)
            from dbo.test_window_frames t2
            where t2.fielda = t1.fielda
              and t2.fieldb = t1.fieldb
              and t2.fieldc = t1.fieldc
              and t2.fieldd = t1.fieldd
              --and t2.fielde = t1.fielde -- *** ??? ***
              and t2.record_date >= dateadd(month, -5, rd.reference_date) -- *** Modified ***
              and t2.record_date < dateadd(month, 1, rd.reference_date)   -- *** Modified ***
        ) as semester_amount,
        (
            select sum(t2.amount)
            from dbo.test_window_frames t2
            where t2.fielda = t1.fielda
              and t2.fieldb = t1.fieldb
              and t2.fieldc = t1.fieldc
              and t2.fieldd = t1.fieldd
              --and t2.fielde = t1.fielde -- *** ??? ***
              and t2.record_date >= dateadd(month, -11, rd.reference_date) -- *** Modified ***
              and t2.record_date < dateadd(month, 1, rd.reference_date)    -- *** Modified ***
        ) as year_amount
    from 
        dbo.test_window_frames t1
    cross apply ( -- *** Added ***
        select datetrunc(month, t1.record_date) as reference_date
        --select dateadd(month, datediff(month, 0, t1.record_date), 0) as reference_date
    ) rd
    group by 
        t1.fielda, t1.fieldb, t1.fieldc, t1.fieldd, t1.fielde, rd.reference_date
    order by
        t1.fielda, t1.fieldb, t1.fieldc, t1.fieldd, rd.reference_date, t1.fielde;
    
    

    The first thing I did was to move the reference_date calculation to a CROSS APPLY so that it can be defined once and conveniently referenced several times elsewhere. For SQL Server 2022 and later, you can use the DATETRUNC() function. For earlier versions, the dateadd(xxx, datediff(xxx, 0, ...), 0) hack can still be used.

    I also changed the date range in the subqueries to be half-open intervals >= current month - N and < current month + 1. This cleanly includes all dates through the end of the current month.

    Lastly, I added the and t2.fielde = t1.fielde condition to the subqueries, as this was needed to match your desired results. However, I later commented out this condition, as it does not match the original Oracle query.

    Because this query performs multiple range-lookups in subselects (instead of the original Oracle window functions), it is essential that those lookups be supported by an appropriate index such as test_window_frames(fielda, fieldb, fieldc, fieldd, record_date) INCLUDE(amount). (The INCLUDE part saves a bunch of clustered index seeks.)

    Results:

    fielda fieldb fieldc fieldd fielde reference
    date
    current
    amount
    semester
    amount
    year
    amount
    A X C1 D2 E1 2024-03-01 24 156 156
    A X C1 D2 E2 2024-03-01 132 156 156
    A X C1 D2 E1 2024-06-01 176 332 332
    A X C1 D2 E1 2024-07-01 83 488 488
    A X C1 D2 E2 2024-07-01 73 488 488
    A X C1 D2 E2 2024-10-01 110 442 598
    A X C2 D1 E1 2024-04-01 67 67 67
    A X C2 D1 E1 2024-12-01 69 69 136
    B X C1 D1 E1 2024-01-01 50 50 50
    B X C1 D1 E2 2024-02-01 30 80 80
    B X C1 D1 E1 2024-06-01 55 135 135
    B X C1 D1 E2 2024-08-01 45 100 180
    B X C1 D2 E1 2024-09-01 65 65 65
    B X C2 D1 E2 2024-10-01 35 35 35
    B X C2 D2 E1 2024-03-01 80 80 80
    B X C2 D2 E2 2024-05-01 90 170 170
    B X C2 D2 E2 2024-07-01 20 190 190
    B X C2 D2 E1 2024-11-01 75 95 265

    Note that the above does not match your posted results, but should match the original Oracle results. (See fiddle links below.)

    If you uncomment the and t2.fielde = t1.fielde condition, the results will match your posted desired results, but not the original Oracle results.

    fielda fieldb fieldc fieldd fielde reference
    date
    current
    amount
    semester
    amount
    year
    amount
    A X C1 D2 E1 2024-03-01 24 24 24
    A X C1 D2 E2 2024-03-01 132 132 132
    A X C1 D2 E1 2024-06-01 176 200 200
    A X C1 D2 E1 2024-07-01 83 283 283
    A X C1 D2 E2 2024-07-01 73 205 205
    A X C1 D2 E2 2024-10-01 110 183 315
    A X C2 D1 E1 2024-04-01 67 67 67
    A X C2 D1 E1 2024-12-01 69 69 136
    ... ... ... ... ... ... ... ... ...

    A slightly different approach is to group and calculate the monthly summaries in a CTE. The semester_amount and year_amount lookups are then performed against the CTE instead of the original table as part of the final query. It is even possible to combine the two sum calculations by retrieving the prior 12-months of data once and using a variation of conditional aggregation to calculate two different sums.

    with monthly_sums as (
        select 
            t1.fielda,
            t1.fieldb,
            t1.fieldc,
            t1.fieldd,
            t1.fielde,
            rd.reference_date,
            sum(t1.amount) as current_amount
        from 
            dbo.test_window_frames t1
        cross apply (
            select datetrunc(month, t1.record_date) as reference_date
        ) rd
        group by 
            t1.fielda, t1.fieldb, t1.fieldc, t1.fieldd, t1.fielde, rd.reference_date
    )
    select 
        ms.fielda,
        ms.fieldb,
        ms.fieldc,
        ms.fieldd,
        ms.fielde,
        ms.reference_date,
        a.semester_amount,
        a.year_amount
    from 
        monthly_sums ms
    cross apply (
        select
            sum(ms2.current_amount * x.is_semester) as semester_amount,
            sum(ms2.current_amount) as year_amount
        from monthly_sums ms2
        cross apply (
            select case when ms2.reference_date >= dateadd(month, -5, ms.reference_date)
                   then 1 else 0 end as is_semester
        ) x
        where ms2.fielda = ms.fielda
          and ms2.fieldb = ms.fieldb
          and ms2.fieldc = ms.fieldc
          and ms2.fieldd = ms.fieldd
          and ms2.reference_date >= dateadd(month, -11, ms.reference_date)
          and ms2.reference_date <= ms.reference_date  -- Inclusive
    ) a
    order by
        ms.fielda, ms.fieldb, ms.fieldc, ms.fieldd, ms.reference_date, ms.fielde;
    

    if you later add date-range filters, take care to include the extra 12 months of prior data in the CTE calculations to support the later semester_amount and year_amount calculations. The final query would include a filter to exclude the earlier data.

    You should run performance tests on the various queries to determine what works best for your data. Neither is likely to be as performant as one that used a values-between window range, if only SQL Server would support that.

    See this db<>fiddle for a demo of the above. For comparison, this db<>fiddle shows the results of the original Oracle query against the same posted sample data.