I am writing a stored procedure in T-SQL. The output of this procedure feeds a GUI, telling it where to put blocks of text. The blocks can't overlap, and all the logic is to be done in SQL. It's supposed to do something simple sounding, but has been infuriatingly difficult.
Imagine a matrix with line numbers (line1, line2, etc.) on the left, and week numbers (week 1, week 2, week 3) at the top. If one banner overlaps another on date range, it's supposed to drop to the next line. It should always be assigned to the lowest line number where there is no date overlap.
The part I'm struggling with is assigning the line numbers. The weeks can be ignored.
An example of how it should work:
ID | StartDate | EndDate |
---|---|---|
1 | 2023-03-25 | 2023-04-28 |
2 | 2023-04-01 | 2023-05-02 |
3 | 2023-05-05 | 2023-05-17 |
4 | 2023-05-15 | 2023-06-08 |
5 | 2023-05-17 | 2023-06-09 |
6 | 2023-06-15 | 2023-07-09 |
The above data should be assigned line numbers from oldest to newest.
I've tried window functions, cursors, and all kinds of crazy cross joins, recursive CTEs and subqueries, but I just can't get it to work like I want to. I end up with Line Numbers of 1 for all of them, or NULL line numbers, or just 1's and 2's which still overlap. Any help would be greatly appreciated.
This cannot simply be pivoted out, since it can't overlap with other entries on the same line. Some sample data below. The ExpectedLineNumber here is what the Line Number should be if calculated correctly to display the data right on the GUI.
DROP TABLE IF EXISTS #TheDates
CREATE TABLE #TheDates
(
ID INT
,StartDate DATE
,EndDate DATE
,ExpectedLineNumber INT
)
INSERT INTO #TheDates (ID, StartDate, EndDate, ExpectedLineNumber) VALUES (1, '2023-03-25', '2023-04-28', 1)
INSERT INTO #TheDates (ID, StartDate, EndDate, ExpectedLineNumber) VALUES (2, '2023-04-01', '2023-05-02', 2)
INSERT INTO #TheDates (ID, StartDate, EndDate, ExpectedLineNumber) VALUES (3, '2023-05-05', '2023-05-17', 1)
INSERT INTO #TheDates (ID, StartDate, EndDate, ExpectedLineNumber) VALUES (4, '2023-05-15', '2023-06-08', 2)
INSERT INTO #TheDates (ID, StartDate, EndDate, ExpectedLineNumber) VALUES (5, '2023-05-18', '2023-06-11', 3)
INSERT INTO #TheDates (ID, StartDate, EndDate, ExpectedLineNumber) VALUES (6, '2023-06-15', '2023-07-09', 1)
@Dave ,
First, thank you for the "Readily Consumable Test Data". Is really does make things a lot easier for the people that would help you... especially when it actually works!
Here's a trick that Itzik Ben-Gan published quite a while back for "Packing Intervals" in an article at the following link...
https://www.itprotoday.com/sql-server/new-solution-packing-intervals-problem
And, using your good test data, here's the solution using the "Packing Intervals" method to solve your problem. Details as to how it works are in the comments and the output contains the working columns so that you can see how it works. See the final comment in the code for what to do for just the columns you actually want to return.
As an important performance/resource usage sidebar, the code below uses only a single logical read (the date DOES fit on just one page). The Recursive CTE method that was posted in the comments of your post used 37.
WITH
ctePrevEndDate AS
(--==== Find the prev end date from the temporally previous row according to StartDate/EndDate.
-- We use MAX() here as Itzik did but a LAG would probably work, as well.
SELECT *
,PrevEndDate = MAX(EndDate) OVER (ORDER BY StartDate, EndDate
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
FROM #TheDates
)
,cteDateGrp AS
(--==== Assign a group number to overlapping groups of dates by marking the first row of
-- each group with a "1" and then summing them up over the rows that follow.
SELECT *
,DateGrp = SUM(isg.IsGrpStart) OVER (ORDER BY StartDate, EndDate
ROWS UNBOUNDED PRECEDING)
FROM ctePrevEndDate
-- Marks the first value of each date group with a "1".
CROSS APPLY (VALUES(IIF(StartDate <= PrevEndDate,0,1)))isg(IsGrpStart)
)
--===== Assign the line number by partitions formed by the date group.
-- The * is to show all the working columns for demonstration purposes.
-- Change it to only the columns you want to include.
SELECT *
,LineNumber = ROW_NUMBER() OVER (PARTITION BY DateGrp
ORDER BY StartDate, EndDate)
FROM cteDateGrp
ORDER BY StartDate, EndDate
;
And, here's the output from that code using your test data...