I would like to summarize a time series of data by:
The image below shows a snippet of the data and the desired output.
Since the Categories
(1. Low, 2. Moderate, 3. Strong, 4. Extreme) repeat throughout the timeline, an approach such as the SELECT
below will summarize the MIN
and MAX
ReadingDateTime
values over the entire table in four rows.
ReadingDateTime
over all readings in the Low category.ReadingDateTime
over all readings in the Moderate category.ReadingDateTime
over all readings in the Strong category.ReadingDateTime
over all readings in the Extreme category. SELECT..
MIN(ReadingDateTime)
,MAX(ReadingDateTime)
FROM CurrentLogger
GROUP BY LoggerId
,Category
ORDER BY LoggerId
,ReadingDateTime
For example:
I have one approach, but I am sure there is a much better way. I mark the start and end of runs of Categories by using the SQL LAG
and LEAD
functions.
SELECT LoggerId
,Location
,Category
,dBDateTime
,FirstInRun = IIF(LAG(dBCategory,1,0) OVER(ORDER BY InstrumentId, dBDateTime)<>dBCategory,1,0)
,LastInRun = IIF(LEAD(dBCategory,1,0) OVER(ORDER BY InstrumentId, dBDateTime)<>dBCategory,1,0)
FROM CurrentLogger
ORDER BY InstrumentId
,dBDateTime
The output is below:
There are several ugly and slow ways to roll this up to the desired output.
My question is:
One way to roll it up is to use the ROW_NUMBER() function to create your groups and then get the MIN/MAX dates for each group. Consider the following:
CREATE TABLE #CurrentLogger(LoggerID int, Location varchar(20),Category varchar(20),ReadingDateTime datetime)
INSERT INTO #CurrentLogger VALUES
(10,'OutSide','1. Low','9/14/21 8:16:37 AM'),
(10,'OutSide','1. Low','9/14/21 8:16:39 AM'),
(10,'OutSide','1. Low','9/14/21 8:16:41 AM'),
(10,'OutSide','2. Moderate','9/14/21 8:16:43 AM'),
(10,'OutSide','2. Moderate','9/14/21 8:16:45 AM'),
(10,'OutSide','2. Moderate','9/14/21 8:16:47 AM'),
(10,'OutSide','2. Moderate','9/14/21 8:16:49 AM'),
(10,'OutSide','4. Extreme','9/14/21 8:16:51 AM'),
(10,'OutSide','3. Strong','9/14/21 8:16:53 AM'),
(10,'OutSide','3. Strong','9/14/21 8:16:55 AM'),
(10,'OutSide','3. Strong','9/14/21 8:16:57 AM'),
(10,'OutSide','2. Moderate','9/14/21 8:16:59 AM'),
(10,'OutSide','2. Moderate','9/14/21 8:17:01 AM'),
(10,'OutSide','2. Moderate','9/14/21 8:17:03 AM'),
(10,'OutSide','2. Moderate','9/14/21 8:17:05 AM'),
(10,'OutSide','1. Low','9/14/21 8:17:07 AM'),
(10,'OutSide','1. Low','9/14/21 8:17:09 AM'),
(10,'OutSide','1. Low','9/14/21 8:17:11 AM'),
(10,'OutSide','1. Low','9/14/21 8:17:13 AM'),
(10,'OutSide','1. Low','9/14/21 8:17:15 AM'),
(10,'OutSide','1. Low','9/14/21 8:17:17 AM');
WITH CTE AS
(
SELECT *,
ROW_NUMBER() OVER(PARTITION BY LoggerID ORDER BY ReadingDateTime)
- ROW_NUMBER() OVER(PARTITION BY LoggerID,Location,Category ORDER BY ReadingDateTime) Grp
FROM #CurrentLogger
)
SELECT LoggerID,Location,Category, MIN(ReadingDateTime) StartDate, MAX(ReadingDateTime) EndDate
FROM CTE
GROUP BY Grp,LoggerID,Location,Category
ORDER BY StartDate