Search code examples
sqlsql-servert-sqlsql-server-2017

Calculating values with overlapping periods


I have a query that generates a monthly report for drivers evaluations. Some drivers may not have reports for some months. The reports contain various of violations, one of these violations is a monthly cumulated violation, and it resets every year which is done in in separate query. Everything where working just fine, except when the client added a new requirement to this part. The requirement is to change it from resetting every year, to resetting any violation that passed 180 days during the year.

Here is a report's sample (for one driver report):

RN                   ReportId             DriverId             StartDate               EndDate                 Level1Vio   Lv1YTD
-------------------- -------------------- -------------------- ----------------------- ----------------------- ----------- -----------
1                    64                   2073                 2020-10-21 00:00:00.000 2020-11-21 23:59:59.000 1           1
2                    65                   2073                 2020-11-24 05:13:04.133 2020-12-24 05:13:04.133 0           1
3                    67                   2073                 2020-12-23 06:53:52.870 2021-01-23 06:53:52.870 0           1
4                    68                   2073                 2021-01-22 06:33:43.127 2021-02-22 06:33:43.127 0           1
5                    69                   2073                 2021-02-23 04:02:58.680 2021-03-23 04:02:58.680 1           2
6                    70                   2073                 2021-03-22 23:39:33.570 2021-04-22 23:39:33.570 0           2
7                    71                   2073                 2021-04-22 00:28:35.230 2021-05-22 00:28:35.230 0           2
8                    72                   2073                 2021-05-22 15:46:21.767 2021-06-22 15:46:21.767 1           3
9                    73                   2073                 2021-06-25 06:42:02.130 2021-07-25 06:42:02.130 1           4
10                   76                   2073                 2021-07-23 17:42:01.533 2021-08-23 17:42:01.533 0           4

The old query that where working was (generates the above sample) :

SELECT 
    RN
,   ReportId
,   DriverId
,   StartDate
,   EndDate
,   Level1Vio
,   Lv1YTD = SUM(Level1Vio) OVER(PARTITION BY DriverId ORDER BY ReportId ROWS UNBOUNDED PRECEDING) 
FROM Report

The excepted result should be (for the new requirement):

RN                   ReportId             DriverId             StartDate               EndDate                 Level1Vio   Lv1YTD
-------------------- -------------------- -------------------- ----------------------- ----------------------- ----------- -----------
1                    64                   2073                 2020-10-21 00:00:00.000 2020-11-21 23:59:59.000 1           1
2                    65                   2073                 2020-11-24 05:13:04.133 2020-12-24 05:13:04.133 0           1
3                    67                   2073                 2020-12-23 06:53:52.870 2021-01-23 06:53:52.870 0           1
4                    68                   2073                 2021-01-22 06:33:43.127 2021-02-22 06:33:43.127 0           1
5                    69                   2073                 2021-02-23 04:02:58.680 2021-03-23 04:02:58.680 1           2
6                    70                   2073                 2021-03-22 23:39:33.570 2021-04-22 23:39:33.570 0           2
7                    71                   2073                 2021-04-22 00:28:35.230 2021-05-22 00:28:35.230 0           1
8                    72                   2073                 2021-05-22 15:46:21.767 2021-06-22 15:46:21.767 1           2
9                    73                   2073                 2021-06-25 06:42:02.130 2021-07-25 06:42:02.130 1           3
10                   76                   2073                 2021-07-23 17:42:01.533 2021-08-23 17:42:01.533 0           3

SQL Fiddle

Explanation

If you take ReportId = 64 (Nov 2020 Report) the driver has a registered violation, which should be removed after 180 days, and then in ReportId = 69(Mar 2021 Report) got another violation. The first violation should end at ReportId = 71(May 2021 Report) and the second one should be removed at ReportId = 76(Aug 2021 Report). This principle is applied on each violation.

Trys I'm trying hard to avoid cursor and while loops actually, but I couldn't find a way to do this under one query SELECT like what I did in the above query. I've tried CTE and joins, but with no luck. Everything forces me to use loops, which I try to avoid (because of performance issues). I'm sure there is a way to solve it with simple SELECT, but I'm out of thoughts. So, any thoughts or solutions would be much appreciated.


Solution

  • Instead of window functions, we can just join with the records which overlap by 180 days.

    Updated fiddle: Working Test Case

    SELECT t1.rn, t1.ReportId, t1.DriverId, t1.StartDate, t1.EndDate, t1.Level1Vio
         , SUM(t2.Level1Vio) AS Lv1YTD
      FROM Report AS t1
      JOIN Report AS t2
        ON t1.StartDate BETWEEN t2.StartDate AND DATEADD(day, 180, t2.StartDate)
       AND t1.DriverId = t2.DriverId
     GROUP BY t1.rn, t1.DriverId, t1.ReportId, t1.StartDate, t1.EndDate, t1.Level1Vio
     ORDER BY t1.StartDate
    ;
    

    Note: I used 180 days from the start of the reported violation. Adjust as needed.

    Also note: I included a version which uses a CTE term. You can use that if you wish to filter the rows of interest on both sides of the join more conveniently, restricting them to a specific year, or just add that year restriction to the join logic or add a WHERE clause.

    There are quite a few ways to include the year logic all within this query, without resorting to separate queries to generate per year.