Search code examples
sqlsql-servert-sqlgaps-and-islands

How To Identify Continuous Spans Of Time And Compute Summary Statistics In Time-Series Data?


I have time-series data that I have wrangled into a #temp table in the format below:

Start Finish Value
2022-05-12 11:30 2022-05-12 11:45 28.264
2022-05-12 11:45 2022-05-12 12:00 28.262
2022-05-12 12:00 2022-05-12 12:15 28.242
2022-06-09 11:30 2022-06-09 11:45 27.862
2022-06-09 12:00 2022-06-09 12:15 13.727
2022-06-09 12:15 2022-06-09 12:30 13.717
2022-06-09 12:30 2022-06-09 12:45 13.724
2022-06-09 12:45 2022-06-09 13:00 9.1226

I want to identify spans of time in this data where the Start time of the next row overlaps with the Finish time of the current row and find the average of the values in the time span and maximum value observed in that time span. It should look like this:

Start Finish AvgVal MaxVal
2022-05-12 11:30 2022-05-12 12:15 28.256 28.264
2022-06-09 11:30 2022-06-09 11:45 27.862 27.862
2022-06-09 12:00 2022-06-09 13:00 12.573 13.727

I was thinking of PARTITIONing over the table but I am not sure how to condition the partitioning on having the Start of the next row equal to the Finish of the current row. This is procedurally easy to write, but I have to do this in a stored procedure so I would like to do it entirely descriptively.

Perhaps I could first add ROW_NUMBER() OVER (ORDER BY Start ASC) as Row_Num to my initial table, then:

SELECT Start, Finish, AVERAGE(Value) AS AvgVal, MAX(Value) AS MaxVal
FROM #temp
PARTITION BY

... something where the row number is used to index into the next row to compare the next row's Start time with the current row's Finish time?

Any help is appreciated.


Solution

  • First you need to identify the continuous group of time series. You can use LAG() and compare previous Finish with current row Start. Perform a cumulative sum() and you gets the grouping required. After that it is just a normal GROUP BY query to get the AVG() and MAX()

    select [Start]  = min([Start]),
           [Finish] = max([Finish]),
           [AvgVal] = avg([Value]),
           [MaxVal] = max([Value])
    from
    (
        select *, 
               [grp] = sum(g) over (order by [Start])
        from
        (
            select *, 
                   [g] = case when [Start] = lag([Finish]) over (order by [Start])
                              then 0
                              else 1
                              end
            from   #temp
        ) t
    ) t
    group by [grp]
    order by [Start]