I have a list of timestamps (WTIME), some value associated with that timestamp (VAL) and a category (WTYPE). There will only ever be one entry per timestamp. I need to transform the data so that I have 1 row per "grouping" of WTYPES.
DECLARE @Table TABLE (
WTIME DATETIME,
VAL FLOAT,
WTYPE INT
)
INSERT INTO @TABLE
VALUES
('2023-07-31 01:00:00', 10, 1),
('2023-07-31 02:00:00', 11, 1),
('2023-07-31 03:00:00', 12, 1),
('2023-07-31 04:00:00', 4, 2),
('2023-07-31 05:00:00', 6, 2),
('2023-07-31 06:00:00', 15, 1)
In the example here, there are three timestamps with WTYPE 1, it then switches to WTYPE 2 and then back to WTYPE 1 - so I would need an output of 3 rows (i.e. the 3 times it changed WTYPE) with the start time, end time, average value, WTYPE and number of rows.
I know this can be done with joining the dataset back onto itself but this method runs too slow for my purposes and I feel it in my bones that it can be achieved with windowed functions alone, I just cannot wrap my head around it!
You could treat this as a classic gaps and islands problem and give each island a unique numbering and then group on that but the below can do it all in one scan ordered by WTIME
so should be more efficient than that - Fiddle.
The idea being to calculate the cumulative sums and counts across the whole table, whilst preserving the last row from each island and look back at the previous such preserved row to calculate the diffs.
WITH T AS
(
SELECT *,
MinWTime = MIN(WTIME) OVER (ORDER BY WTIME ROWS UNBOUNDED PRECEDING) ,
CumeVal = SUM(VAL) OVER (ORDER BY WTIME ROWS UNBOUNDED PRECEDING) ,
CumeCount = COUNT(*) OVER (ORDER BY WTIME ROWS UNBOUNDED PRECEDING) ,
NextType = LEAD(WTYPE) OVER (ORDER BY WTIME),
NextWTime = LEAD(WTIME) OVER (ORDER BY WTIME)
FROM @TABLE
), LastInIsland AS
(
SELECT *,
SUM_VAL = CumeVal - LAG(CumeVal,1,0) OVER (ORDER BY WTIME),
NUM_ROWS = CumeCount - LAG(CumeCount,1,0) OVER (ORDER BY WTIME) ,
START_TIME = LAG(NextWTime,1,MinWTime) OVER (ORDER BY WTIME)
FROM T
WHERE NextType <> WTYPE OR NextType IS NULL
)
SELECT START_TIME,
END_TIME = CASE WHEN NextType IS NOT NULL THEN WTIME END,
WTYPE,
AVG_VAL = SUM_VAL/ NUM_ROWS,
NUM_ROWS
FROM LastInIsland
ORDER BY WTIME
Optionally you can simplify the execution plan somewhat by adding a dummy temp table with a columnstore index into the mix to get batch mode window aggregates. Fiddle