Search code examples
sql-serverwindow-functionsgaps-and-islands

How to combine intervals across multiple rows?


I have a table that has items, stock status (out of stock or in stock), and the starting and ending date range for the stocking status of the item.

SELECT
    [Item Number],
    [Start Date],
    [End Date],
    [Stock Status]
FROM HistoricalStockStatus
ORDER BY [Item Number], [Start Date]

The table below shows the data returned from the above query. You'll notice that there can be multiple rows for the same interval for the same stocking status. For example, look at the first two rows of the output. The item was out of stock from 2009-09-25 to 2009-12-04 and listed out of stock again from 2009-12-04 to 2009-12-07.

I would like to combine the out of stock status intervals to a single row for the intervals that have consecutive dates. Using the above example the first two rows would be combined into a single row with the start date of 2009-09-25 and end date of 2009-12-07.

Item Number Start Date End Date Stock Status
Item A 2009-09-25 2009-12-04 Out of Stock
Item A 2009-12-04 2009-12-07 Out of Stock
Item A 2009-12-07 2010-07-27 In Stock
Item A 2010-07-27 2010-07-27 Out of Stock
Item A 2010-07-27 2010-08-05 Out of Stock
Item A 2010-08-05 2010-10-07 Out of Stock
Item A 2010-10-07 2010-11-16 Out of Stock
Item A 2010-11-16 2011-07-01 In Stock
Item A 2011-07-01 2011-07-13 Out of Stock
Item A 2011-07-13 2011-08-03 Out of Stock
Item A 2011-08-03 2011-08-26 In Stock
Item A 2011-08-26 2011-08-29 Out of Stock
Item A 2011-08-29 2011-10-10 Out of Stock
Item A 2011-10-10 2011-11-23 In Stock
Item A 2011-11-23 2011-11-29 Out of Stock
Item A 2011-11-29 2011-11-29 Out of Stock
Item A 2011-11-29 2011-12-21 Out of Stock
Item A 2011-12-21 2017-11-06 In Stock
Item A 2017-11-06 2017-12-28 Out of Stock
Item A 2017-12-28 2018-01-15 In Stock
Item A 2018-01-15 2018-02-01 Out of Stock
Item A 2018-02-01 2019-03-19 In Stock
Item A 2019-03-19 2019-03-28 Out of Stock
Item A 2019-03-28 2021-03-30 In Stock
Item A 2021-03-30 2021-07-16 Out of Stock
Item A 2021-07-16 NULL In Stock

The desired output (the table below) is to have a single row for each interval of time that the item was out of stock. I don't need to have the intervals that an item was in stock.

Item Number Start Date End Date Stock Status
Item A 2009-09-25 2009-12-07 Out of Stock
Item A 2010-07-27 2010-11-16 Out of Stock
Item A 2011-07-01 2011-08-03 Out of Stock
Item A 2011-08-26 2011-10-10 Out of Stock
Item A 2011-11-23 2011-12-21 Out of Stock
Item A 2017-11-06 2017-12-28 Out of Stock
Item A 2018-01-15 2018-02-01 Out of Stock
Item A 2019-03-19 2019-03-28 Out of Stock
Item A 2021-03-30 2021-07-16 Out of Stock

I believe this problem is a gaps and islands problem but I'm having trouble finding resources that I can apply to this problem. I believe it can be solved with window functions but I'm also having trouble figuring out how to apply them to this query.

My thought process is to use a ranking function to add a column that can be used to partition the rows into blocks and then use MIN([Start Date]) OVER(PARTITION BY [Partition Block]) to find the start date and use MAX([End Date]) OVER(PARTITION BY [Partition Block]) to find the end date.

Something like this table below is what I envision as the intermediate result set but I need help generating the Partition Block column.

Item Number Out Date In Date Stock Status Partition Block
Item A 2009-09-25 2009-12-04 Out of Stock 1
Item A 2009-12-04 2009-12-07 Out of Stock 1
Item A 2009-12-07 2010-07-27 In Stock 0
Item A 2010-07-27 2010-07-27 Out of Stock 2
Item A 2010-07-27 2010-08-05 Out of Stock 2
Item A 2010-08-05 2010-10-07 Out of Stock 2
Item A 2010-10-07 2010-11-16 Out of Stock 2
Item A 2010-11-16 2011-07-01 In Stock 0
Item A 2011-07-01 2011-07-13 Out of Stock 3
Item A 2011-07-13 2011-08-03 Out of Stock 3
Item A 2011-08-03 2011-08-26 In Stock 0
Item A 2011-08-26 2011-08-29 Out of Stock 4
Item A 2011-08-29 2011-10-10 Out of Stock 4
Item A 2011-10-10 2011-11-23 In Stock 0
Item A 2011-11-23 2011-11-29 Out of Stock 5
Item A 2011-11-29 2011-11-29 Out of Stock 5
Item A 2011-11-29 2011-12-21 Out of Stock 5
Item A 2011-12-21 2017-11-06 In Stock 0
Item A 2017-11-06 2017-12-28 Out of Stock 6
Item A 2017-12-28 2018-01-15 In Stock 0
Item A 2018-01-15 2018-02-01 Out of Stock 7
Item A 2018-02-01 2019-03-19 In Stock 0
Item A 2019-03-19 2019-03-28 Out of Stock 8
Item A 2019-03-28 2021-03-30 In Stock 0
Item A 2021-03-30 2021-07-16 Out of Stock 9
Item A 2021-07-16 NULL In Stock 0

I'm open to any help or suggestions, thanks!


Solution

  • One way is to virtualize groupings based on a change in status. Then use a SUM with the UNBOUNDED PRECEDING to group markers into consecutive groups.

    SELECT
        --Aggregate results
        ItemName AS [Item Name],
        MIN(StartDate) [In Date],
        MAX(EndDate) [Out Date],
        MAX(StockStatus) AS [Stock Status],
        VirtualGroupID AS [Partition Block]   
    FROM
    (
        SELECT
            *,
            --Use SUM with the UNBOUNDED PRECEDING to gather all IsNewGroup Markers into consecutivly ordered groupings. 
            --All 0's between transitions to 1's will have a running sum of preceding 1's applied per partition.
            SUM(IsNewGroup) OVER (PARTITION BY ItemName ORDER BY StartDate ROWS UNBOUNDED PRECEDING) AS VirtualGroupID
        FROM
        (
            SELECT
                ItemName, StartDate, EndDate,StockStatus,
                --This will either be a 1 or 0 if the next item has a new StockStatus - The 1's will be summed above and serialized into island grouos so MAX AND MIN can be applied to each
                CASE WHEN ISNULL(LAG(StockStatus) OVER (PARTITION BY ItemName ORDER BY StartDate),StockStatus)<>StockStatus THEN 1 ELSE 0 END AS IsNewGroup
            FROM
               HistoricalStockStatus
        )AS X
        WHERE
            StockStatus = 'Out of Stock'
    )AS Y
    GROUP BY 
        ItemName,VirtualGroupID
    ORDER BY
        ItemName, MIN(StartDate)
    

    Edit : The query below requires one less pass over the data. untested of course.

    SELECT  
        --Aggregate results
        ItemName AS [Item Name],
        MIN(StartDate) [In Date],
        MAX(EndDate) [Out Date],
        MAX(StockStatus) AS [Stock Status],
        VirtualGroupID AS [Partition Block]
    FROM
    (
        SELECT
            ItemName, StartDate, EndDate,StockStatus,   
            SUM(CASE WHEN StockStatus='Out of Stock' THEN 0 ELSE 1 END) OVER (PARTITION BY ItemName ORDER BY StartDate ROWS UNBOUNDED PRECEDING) AS VirtualGroupID
        FROM
            HistoricalStockStatus
    )AS X
    WHERE
         StockStatus='Out of Stock'
    GROUP BY 
        ItemName,VirtualGroupID
    ORDER BY
        ItemName, MIN(StartDate)