I have time-series data that I have wrangled into a #temp table in the format below:
Start | Finish | Value |
---|---|---|
2022-05-12 11:30 | 2022-05-12 11:45 | 28.264 |
2022-05-12 11:45 | 2022-05-12 12:00 | 28.262 |
2022-05-12 12:00 | 2022-05-12 12:15 | 28.242 |
2022-06-09 11:30 | 2022-06-09 11:45 | 27.862 |
2022-06-09 12:00 | 2022-06-09 12:15 | 13.727 |
2022-06-09 12:15 | 2022-06-09 12:30 | 13.717 |
2022-06-09 12:30 | 2022-06-09 12:45 | 13.724 |
2022-06-09 12:45 | 2022-06-09 13:00 | 9.1226 |
I want to identify spans of time in this data where the Start time of the next row overlaps with the Finish time of the current row and find the average of the values in the time span and maximum value observed in that time span. It should look like this:
Start | Finish | AvgVal | MaxVal |
---|---|---|---|
2022-05-12 11:30 | 2022-05-12 12:15 | 28.256 | 28.264 |
2022-06-09 11:30 | 2022-06-09 11:45 | 27.862 | 27.862 |
2022-06-09 12:00 | 2022-06-09 13:00 | 12.573 | 13.727 |
I was thinking of PARTITIONing over the table but I am not sure how to condition the partitioning on having the Start of the next row equal to the Finish of the current row. This is procedurally easy to write, but I have to do this in a stored procedure so I would like to do it entirely descriptively.
Perhaps I could first add ROW_NUMBER() OVER (ORDER BY Start ASC) as Row_Num
to my initial table, then:
SELECT Start, Finish, AVERAGE(Value) AS AvgVal, MAX(Value) AS MaxVal
FROM #temp
PARTITION BY
... something where the row number is used to index into the next row to compare the next row's Start time with the current row's Finish time?
Any help is appreciated.
First you need to identify the continuous group of time series. You can use LAG()
and compare previous Finish
with current row Start
. Perform a cumulative sum()
and you gets the grouping required. After that it is just a normal GROUP BY
query to get the AVG()
and MAX()
select [Start] = min([Start]),
[Finish] = max([Finish]),
[AvgVal] = avg([Value]),
[MaxVal] = max([Value])
from
(
select *,
[grp] = sum(g) over (order by [Start])
from
(
select *,
[g] = case when [Start] = lag([Finish]) over (order by [Start])
then 0
else 1
end
from #temp
) t
) t
group by [grp]
order by [Start]