I have a table in SQL Server that tracks the status of different IDs over time. The table includes both historical data and current data, and I need to write a query that will return the correct status and date ranges for each ID, whether or not the status has changed.
The table structure is as follows:
CREATE TABLE #history_Card_status (
HistoryData BIT,
StatusUpdateDate DATETIME,
CreateDate DATETIME,
Id BIGINT,
Status NVARCHAR(50)
);
INSERT INTO #history_Card_status (HistoryData, StatusUpdateDate, CreateDate, Id, Status)
VALUES
-- Current Data
(0, NULL, '2023-06-13 05:07:38.700', 222, 'Open'),
(0, NULL, '2021-07-16 00:44:46.740', 111, 'Closed'),
-- Historical Data
(1, '2024-08-20 21:10:57.093', '2021-07-16 00:44:46.740', 111, 'Closed'),
(1, '2024-07-05 13:22:04.220', '2021-07-16 00:44:46.740', 111, 'Closed'),
(1, '2024-07-05 13:13:02.133', '2021-07-16 00:44:46.740', 111, 'Inactive/Block'),
(1, '2024-07-02 03:01:12.467', '2021-07-16 00:44:46.740', 111, 'Inactive/Block'),
(1, '2024-06-24 02:12:00.773', '2021-07-16 00:44:46.740', 111, 'Inactive/Block'),
(1, '2024-06-24 02:12:00.687', '2021-07-16 00:44:46.740', 111, 'Inactive/Block'),
(1, '2024-06-24 02:00:46.040', '2021-07-16 00:44:46.740', 111, 'Open'),
(1, '2024-06-24 02:00:44.303', '2021-07-16 00:44:46.740', 111, 'Open'),
(1, '2024-04-14 11:57:21.133', '2021-07-16 00:44:46.740', 111, 'Open'),
(1, '2024-04-14 11:52:09.073', '2021-07-16 00:44:46.740', 111, 'Open'),
(1, '2024-07-01 04:28:08.213', '2023-06-13 05:07:38.700', 222, 'Open'),
(1, '2024-07-01 03:39:54.607', '2023-06-13 05:07:38.700', 222, 'Open'),
(1, '2024-04-24 11:18:59.380', '2023-06-13 05:07:38.700', 222, 'Open'),
(1, '2024-04-24 11:18:59.227', '2023-06-13 05:07:38.700', 222, 'Open');
This table contains both historical records (marked with HistoryData = 1) and current records (marked with HistoryData = 0). The StatusUpdateDate column is filled only for historical records, while the CreateDate column is present in all records.
Problem
I need to create a query that returns the status and the corresponding date ranges for each ID. The query should handle both cases where the status changes and cases where it does not. The expected result should look like this:
StatusUpdateDate StatusTillDate Id Status
----------------------- ---------------------- ------ -----------
2024-07-05 13:13:02.133 2024-08-24 08:49:31.233 111 Closed
2024-06-24 02:00:46.040 2024-07-05 13:13:02.133 111 Inactive/Block
2021-07-16 00:44:46.740 2024-06-24 02:00:46.040 111 Open
2023-06-13 05:07:38.700 2024-08-24 08:49:31.233 222 Open
My Attempt
I started with a query that works for cases where status changes occur:
WITH Step1 AS (
SELECT
StatusUpdateDate = ISNULL(StatusUpdateDate, ISNULL(history.CreateDate, '1970-01-01 00:00:00.000')),
Id,
Status,
lag_status = LAG(Status) OVER (PARTITION BY Id ORDER BY StatusUpdateDate DESC),
ChangeFlag = CASE
WHEN Status <> LAG(Status) OVER (PARTITION BY Id ORDER BY StatusUpdateDate DESC)
THEN 1 ELSE 0 END
FROM #history_Card_status AS history
)
SELECT
StatusUpdateDate,
StatusTillDate = ISNULL(LAG(StatusUpdateDate) OVER (PARTITION BY Id ORDER BY StatusUpdateDate DESC), DATEADD(DAY, 2, GETDATE())),
Id,
Status = ISNULL(LAG(Status) OVER (PARTITION BY Id ORDER BY StatusUpdateDate DESC), lag_status)
FROM Step1
WHERE ChangeFlag = 1;
This query works well for IDs where a status change has occurred, but I only see IDs with changes, like this:
StatusUpdateDate StatusTillDate Id Status
----------------------- ---------------------- ------ -----------
2024-07-05 13:13:02.133 2024-08-24 06:22:39.003 111 Closed
2024-06-24 02:00:46.040 2024-07-05 13:13:02.133 111 Inactive/Block
2021-07-16 00:44:46.740 2024-06-24 02:00:46.040 111 Open
Then, I tried to handle the rows where changes did not take place:
WITH Step1 AS (
SELECT
StatusUpdateDate = ISNULL(StatusUpdateDate, ISNULL(history.CreateDate, '1970-01-01 00:00:00.000')),
Id,
Status,
lag_status = LAG(Status) OVER (PARTITION BY Id ORDER BY StatusUpdateDate DESC),
ChangeFlag = CASE
WHEN Status <> LAG(Status) OVER (PARTITION BY Id ORDER BY StatusUpdateDate DESC)
THEN 1 ELSE 0 END,
rownum = ROW_NUMBER() OVER (PARTITION BY Id ORDER BY StatusUpdateDate DESC)
FROM #history_Card_status AS history
)
SELECT
StatusUpdateDate,
StatusTillDate = ISNULL(LAG(StatusUpdateDate) OVER (PARTITION BY Id ORDER BY StatusUpdateDate DESC), DATEADD(DAY, 2, GETDATE())),
Id,
ChangeFlag,
Status = ISNULL(LAG(Status) OVER (PARTITION BY Id ORDER BY StatusUpdateDate DESC), lag_status),
rownum
FROM Step1
WHERE ChangeFlag = 1
OR (ChangeFlag = 0 AND rownum = 1);
This attempt, however, returns disappointing results like:
StatusUpdateDate StatusTillDate Id ChangeFlag Status
----------------------- ---------------------- ------ ----------- -----------
2024-08-20 21:10:57.093 2024-08-24 16:44:22.340 111 0 NULL
2024-07-05 13:13:02.133 2024-08-20 21:10:57.093 111 1 Closed
2024-06-24 02:00:46.040 2024-07-05 13:13:02.133 111 1 Inactive/Block
2021-07-16 00:44:46.740 2024-06-24 02:00:46.040 111 1 Open
2024-07-01 04:28:08.213 2024-08-24 16:44:22.340 222 0 NULL
What I'm Looking For
I'm looking for a way to return the correct status and date ranges for each ID, including those without any status changes, as shown in the expected results.
This looks like a standard gaps and island, with some small wrinkles, like the start date:
SELECT
MIN(CASE WHEN groupflag = 0 THEN CreateDate ELSE StatusUpdateDate END) AS minDate
, MAX(ISNULL(StatusUpdateDate, DATEADD(DAY, 2, GETDATE()))) AS maxDate
, ID, status, groupFlag
FROM (
SELECT *
, SUM(flag) OVER(partition BY ID ORDER BY historydata DESC, statusupdatedate rows BETWEEN unbounded preceding AND CURRENT row) AS groupFlag
FROM (
SELECT
CASE WHEN status <> lag(status) OVER(partition BY id ORDER BY historydata DESC, statusupdatedate) THEN 1 ELSE 0 END AS flag
, *
FROM #history_Card_status h
) x
) x
GROUP BY groupflag, id, status
ORDER BY id, groupflag DESC
I usually use the CASE WHEN something <> lag(something) then 1 else 0 end
which takes cares of the first row.
Then, we summarize the above flag to create the grouping. For performance I use ROWS BETWEEN
, otherwise it defaults to RANGE
which works a little slower.
Finally, I take the CreateDate
for the first row to create the anchor date, and StatusUpdateDate
for the others.
Output:
minDate | maxDate | ID | status | groupFlag |
---|---|---|---|---|
2024-07-05 13:22:04.220 | 2024-08-25 00:04:20.760 | 111 | Closed | 2 |
2024-06-24 02:12:00.687 | 2024-07-05 13:13:02.133 | 111 | Inactive/Block | 1 |
2021-07-16 00:44:46.740 | 2024-06-24 02:00:46.040 | 111 | Open | 0 |
2023-06-13 05:07:38.700 | 2024-08-25 00:04:20.760 | 222 | Open | 0 |