Search code examples
sql-serverlaggaps-and-islandslead

Combine chain of dates where dates are less than few seconds apart


I have a list of timestamps for data acquisition. timestamps close together belong to a cycle. I want to enumerate these cycles. So whenever between two timestamps is more then 100 seconds then it will create the next cycle.

CREATE TABLE [Cycles](
    [Cycle] [int] NOT NULL,
    [CycleStart] [datetime] NOT NULL,
    [CycleEnd] [datetime] NOT NULL,
 CONSTRAINT [PK_Cycles] PRIMARY KEY CLUSTERED 
(
    [Cycle] DESC
))
INSERT INTO [Cycles] VALUES
(10,'2023-12-04T9:00:00','2023-12-04T10:00:00'),
(11,'2023-12-04T21:00:00','2023-12-04T22:00:00'),
(12,'2023-12-04T23:00:00','2023-12-05T00:00:00')
CREATE TABLE [Data](
    [datatimestamp] [datetime] NOT NULL,
 CONSTRAINT [PK_Data] PRIMARY KEY NONCLUSTERED 
(
    [datatimestamp] ASC
))
INSERT INTO [Data] VALUES
('2023-12-05T00:05:20'),
('2023-12-05T00:05:21'),
('2023-12-05T00:05:22'),
('2023-12-05T00:10:01'),
('2023-12-05T00:10:02'),
('2023-12-05T00:10:03')

So I would need to add to Cycles 13 and 14

Here is what I could do as a Select:

DECLARE @lastCycle int = (SELECT TOP 1 Cycle FROM Cycles ORDER BY Cycle DESC);
DECLARE @lastCycleEnd datetime = (SELECT TOP 1 CycleEnd FROM Cycles ORDER BY Cycle DESC);
WITH marks AS (
    SELECT datatimestamp, 
    CASE 
        WHEN DATEDIFF(Second, LAG(datatimestamp, 1, DATEADD(Second, -101, datatimestamp)) OVER (ORDER BY datatimestamp), datatimestamp) > 100 
        THEN 1 ELSE 0 
    END AS NextC
    FROM [Data] 
    WHERE datatimestamp > @lastCycleEnd 
)
SELECT @lastCycle + ROW_NUMBER() OVER (ORDER BY d.datatimestamp) AS Cycle, d.datatimestamp AS CycleBegin 
FROM [Data] d
INNER JOIN marks m On m.datatimestamp = d.datatimestamp
WHERE m.NextC = 1

Which will return the new Cycles and their CycleStarts, as the result for the example data would look like this:

Cycle CycleStart
13 2023-12-05 00:05:20
14 2023-12-05 00:10:01

How can I get the CycleEnd aswell as the third column?


Solution

  • Instead of filtering on NextC as the final step, once you have your data from the marks CTE. i.e.

    datatimestamp NextC
    2023-12-05T00:05:20 1
    2023-12-05T00:05:21 0
    2023-12-05T00:05:22 0
    2023-12-05T00:10:01 1
    2023-12-05T00:10:02 0
    2023-12-05T00:10:03 0

    You need to then do SUM(NextC) OVER(ORDER BY datatimestamp), which will give you a value for each group by timestamps, i.e.

    datatimestamp Cycle
    2023-12-05T00:05:20 1
    2023-12-05T00:05:21 1
    2023-12-05T00:05:22 1
    2023-12-05T00:10:01 2
    2023-12-05T00:10:02 2
    2023-12-05T00:10:03 2

    You can then group on this column and get the min and max datetimes to get your start/end. So your final query would be:

    DECLARE @lastCycle int = (SELECT TOP 1 Cycle FROM Cycles ORDER BY Cycle DESC);
    DECLARE @lastCycleEnd datetime = (SELECT TOP 1 CycleEnd FROM Cycles ORDER BY Cycle DESC);
    WITH marks AS (
        SELECT datatimestamp, 
       CASE 
            WHEN DATEDIFF(Second, LAG(datatimestamp, 1, DATEADD(Second, -101, datatimestamp)) OVER (ORDER BY datatimestamp), datatimestamp) > 100 
            THEN 1 ELSE 0 
        END AS NextC
        FROM [Data] 
        WHERE datatimestamp > @lastCycleEnd 
    ), marks2 AS (
      SELECT m.DataTimeStamp, SUM(m.NextC) OVER (ORDER BY m.DataTimeStamp) AS Cycle
      FROM marks AS m)
    SELECT @lastCycle + ROW_NUMBER() OVER (ORDER BY m.Cycle) AS Cycle, 
      MIN(m.datatimestamp) AS CycleBegin ,
      MAX(m.datatimestamp) AS CycleEnd
    FROM marks2 m 
    GROUP BY m.Cycle;
    

    Example on db<>fiddle