Search code examples
sql-servergaps-and-islands

Dates across multiple rows


I have a table that has the following columns: Item, Date, Status Description, and Stock Status. It contains historical data for the stocking status of items.

Below is a brief sample of the input in table form.

Item Date Status Description Stock Status
ABC123 2020-10-02 Listed Out of Stock (ABC123) Out of Stock
ABC123 2020-10-15 In Stock (ABC123) In Stock
ABC123 2021-05-04 Listed Out of Stock (ABC123) Out of Stock
ABC123 2021-07-15 Listed Out of Stock (ABC123) Out of Stock
ABC123 2021-07-27 Listed Out of Stock (ABC123) Out of Stock
ABC123 2021-08-09 Listed Out of Stock (ABC123) Out of Stock
ABC123 2021-10-19 In Stock (ABC123) In Stock
... ... ... ...

And here is a script to create the test input data into a temp table.

DROP TABLE #OOS_History;

CREATE TABLE #OOS_History 
(
    [Item] NVARCHAR(100),
    [Date] DATETIME2,
    [Status Description] NVARCHAR(100),
    [Stock Status] NVARCHAR(25)
);

INSERT INTO #OOS_History ( [Item], [Date], [Status Description], [Stock Status] ) 
VALUES  
    ('ABC123',  '10/2/2020 13:53',  'Listed Out of Stock (ABC123)', 'Out of Stock'),
    ('ABC123',  '10/15/2020 9:20',  'In Stock (ABC123)',    'In Stock'),
    ('ABC123',  '5/4/2021 8:22',    'Listed Out of Stock (ABC123)', 'Out of Stock'),
    ('ABC123',  '7/15/2021 13:47',  'Listed Out of Stock (ABC123)', 'Out of Stock'),
    ('ABC123',  '7/27/2021 8:04', 'Listed Out of Stock (ABC123)',   'Out of Stock'),
    ('ABC123',  '8/9/2021 13:12', 'Listed Out of Stock (ABC123)',   'Out of Stock'),
    ('ABC123',  '10/19/2021 8:04',  'In Stock (ABC123)',    'In Stock'),
    ('ABC123',  '10/28/2021 11:52', 'Listed Out of Stock (ABC123)', 'Out of Stock'),
    ('ABC123',  '10/29/2021 9:24',  'In Stock (ABC123)',    'In Stock'),
    ('ABC123',  '12/6/2021 7:00', 'Listed Out of Stock (ABC123)',   'Out of Stock'),
    ('ABC123',  '12/6/2021 7:02', 'Listed Out of Stock (ABC123)',   'Out of Stock'),
    ('ABC123',  '12/15/2021 10:47', 'Listed Out of Stock (ABC123)', 'Out of Stock'),
    ('ABC123',  '2/21/2022 14:25',  'In Stock (ABC123)',    'In Stock'),
    ('ABC123',  '4/7/2022 8:36',    'Listed Out of Stock (ABC123)', 'Out of Stock'),
    ('ABC123',  '4/13/2022 7:39', 'In Stock (ABC123)',  'In Stock'),
    ('ABC123',  '4/19/2022 13:06',  'Listed Out of Stock (ABC123)', 'Out of Stock'),
    ('ABC123',  '4/22/2022 14:07',  'In Stock (ABC123)',    'In Stock'),
    ('ABC123',  '4/28/2022 11:30',  'Listed Out of Stock (ABC123)', 'Out of Stock'),
    ('ABC123',  '5/21/2022 6:25', 'In Stock (ABC123)',  'In Stock'),
    ('ABC123',  '7/12/2022 14:10',  'Listed Out of Stock (ABC123)', 'Out of Stock');

My goal is to create a new table that has the following 3 columns: Item, Start Date, and End Date. Each row will represent an interval that an item is out of stock.

Below is the desired output which should list the intervals (start and end date) that an item was out of stock.

Item Start Date End Date
ABC123 2020-10-02 2020-10-15
ABC123 2021-05-04 2021-10-19
ABC123 2021-10-28 2021-10-29
ABC123 2021-12-06 2021-12-15
ABC123 2022-04-07 2022-04-13
ABC123 2022-04-19 2022-04-22
ABC123 2022-04-28 2022-05-21
ABC123 2022-07-12 NULL

I believe I can do this by finding the groups that each row belongs in. In this case there should be 8 groups (rows 1-2), (rows 3-7), (rows 8-9), (rows 10-13), (rows 14-15), (rows 16-17), (rows 18-19), and (row 20). Then I can take the first row from each group as the starting date and the last row from each group as the ending date.

I've done some preliminary research and this appears to be an islands and gaps problem. I've found an answer on stackoverflow that looks like a good starting point but I'm stuck on how to apply it to my problem.

I'm copying the SQL query structure from the answer on this stackoverflow post (Lag() with condition in sql server) in an attempt to find the groups within my data.

The script below containing the CTE shows what I have so far.

WITH Grouped AS 
(
    SELECT 
        [Item],
        [Date],
        [Status Description],
        [Stock Status],
        ROW_NUMBER() OVER(ORDER BY [Date] ASC) AS 'rn1',
        ROW_NUMBER() OVER(PARTITION BY CASE [Stock Status] WHEN 'Out of Stock' THEN 1 END ORDER BY [Date] ASC) AS 'rn2' -- this line needs changed to fix my grouping issue?
    FROM #OOS_History
),

OrderInGroup AS
(
    SELECT
        [Item],
        [Date],
        [Status Description],
        [Stock Status],
        [rn1],
        [rn2],
        [rn1] - [rn2] AS 'GroupNumber',
        ROW_NUMBER() OVER(PARTITION BY [rn1]-[rn2] ORDER BY [Date] ASC) AS rank_asc,
        ROW_NUMBER() OVER(PARTITION BY [rn1]-[rn2] ORDER BY [Date] DESC) AS rank_desc
    FROM Grouped
)

SELECT 
    *,
    LAG([Stock Status], rank_asc) OVER(ORDER BY [Date] ASC) AS 'LastEventOfPrevGroup',
    LEAD([Stock Status], rank_desc) OVER(ORDER BY [Date] DESC) AS 'FirstEventOfNextGroup'
FROM OrderInGroup
ORDER BY [Date] ASC;

Attached is an image that shows the output of the above CTE query. The numbered red boxes around the rows indicate my desired grouping of the rows. The "GroupNumber" column indicates the actual grouping of the rows and it is not correct.

I'm assuming the issue lies within the "rn2" window function but I can't figure out what to change within the partition by and/or order by to get the desired grouping.

enter image description here


Solution

  • Here's a more intuitive approach, in my opinion:

    WITH
        OOS_History_With_Status_Change_Flag AS 
            (
            SELECT
                *,
                CASE 
                    WHEN LAG([Stock Status], 1, 1) OVER (ORDER BY [Date]) <> [Stock Status]
                        THEN 1 -- i.e. the status changed on this date
                END AS [Status Change]
            FROM
                #OOS_History
            ),
            
        OOS_History_Only_Status_Changes AS
            (
            SELECT
                *,
                CAST([Date] AS DATE) AS [Start Date],
                CAST(LEAD([Date], 1) OVER (ORDER BY [Date]) AS DATE) AS [End Date]
            FROM
                OOS_History_With_Status_Change_Flag
            WHERE
                [Status Change] = 1
            )
    
    SELECT
        [Item],
        [Start Date],
        [End Date]
    FROM
        OOS_History_Only_Status_Changes
    WHERE
        [Stock Status] = 'Out of Stock'
    ORDER BY
        [Start Date]
    ;
    

    In essence, the only rows you care about from your initial dataset are ones that represent a status change (i.e. rows with a lagging row, when ordered by date, of differing status). The first CTE creates this flag column using the LAG() function; the second CTE uses this flag as a filter and defines the Start Date and End Date columns.

    And just like that, you're done! Select rows where the status is out-of-stock, and you're golden.