Search code examples
sql-servert-sqlgaps-and-islands

How to create start and end date buckets using windowed functions in SQL


I have a list of timestamps (WTIME), some value associated with that timestamp (VAL) and a category (WTYPE). There will only ever be one entry per timestamp. I need to transform the data so that I have 1 row per "grouping" of WTYPES.

DECLARE @Table TABLE (
    WTIME DATETIME,
    VAL FLOAT,
    WTYPE INT
)
INSERT INTO @TABLE 
VALUES
('2023-07-31 01:00:00', 10, 1),
('2023-07-31 02:00:00', 11, 1),
('2023-07-31 03:00:00', 12, 1),
('2023-07-31 04:00:00', 4, 2),
('2023-07-31 05:00:00', 6, 2),
('2023-07-31 06:00:00', 15, 1)

In the example here, there are three timestamps with WTYPE 1, it then switches to WTYPE 2 and then back to WTYPE 1 - so I would need an output of 3 rows (i.e. the 3 times it changed WTYPE) with the start time, end time, average value, WTYPE and number of rows.

enter image description here

I know this can be done with joining the dataset back onto itself but this method runs too slow for my purposes and I feel it in my bones that it can be achieved with windowed functions alone, I just cannot wrap my head around it!


Solution

  • You could treat this as a classic gaps and islands problem and give each island a unique numbering and then group on that but the below can do it all in one scan ordered by WTIME so should be more efficient than that - Fiddle.

    The idea being to calculate the cumulative sums and counts across the whole table, whilst preserving the last row from each island and look back at the previous such preserved row to calculate the diffs.

    WITH T AS
    (
    SELECT *, 
            MinWTime = MIN(WTIME) OVER (ORDER BY WTIME ROWS UNBOUNDED PRECEDING) ,
            CumeVal = SUM(VAL) OVER (ORDER BY WTIME ROWS UNBOUNDED PRECEDING) , 
            CumeCount = COUNT(*) OVER (ORDER BY WTIME ROWS UNBOUNDED PRECEDING) ,
            NextType = LEAD(WTYPE) OVER (ORDER BY WTIME),
            NextWTime = LEAD(WTIME) OVER (ORDER BY WTIME)  
    FROM @TABLE
    ), LastInIsland AS
    (
    SELECT *,
           SUM_VAL = CumeVal - LAG(CumeVal,1,0) OVER (ORDER BY WTIME),
           NUM_ROWS = CumeCount - LAG(CumeCount,1,0) OVER (ORDER BY WTIME) ,
           START_TIME = LAG(NextWTime,1,MinWTime) OVER (ORDER BY WTIME) 
    FROM T
    WHERE NextType <> WTYPE OR NextType IS NULL
    )
    SELECT START_TIME,
           END_TIME = CASE WHEN NextType IS NOT NULL THEN WTIME END,
           WTYPE,
           AVG_VAL = SUM_VAL/ NUM_ROWS,
           NUM_ROWS
    FROM LastInIsland
    ORDER BY WTIME
    

    Optionally you can simplify the execution plan somewhat by adding a dummy temp table with a columnstore index into the mix to get batch mode window aggregates. Fiddle

    enter image description here