Search code examples
sqlpostgresqltime-serieswindow-functionsgaps-and-islands

Detect Value Changes beyond a threshold in Time Series data in SQL


In PostgreSQL, I am trying to find subjects that have a sequence of values below 60 followed by two consecutive values above 60 that occur afterwards. I'm also interested in the length of time between the first recorded value below 60 and the second value above 60. This event can occur multiple times for each subject.

I am struggling to find out how to search for an unlimited amount of values < 60 followed by 2 values >= 60.

RowID    SubjectID    Value    TimeStamp    
1           1          65   2142-04-29 12:00:00 
2           1          58   2142-04-30 03:00:00 
3           1          55   2142-04-30 04:00:00
4           1          54   2142-04-30 05:00:00
5           1          55   2142-04-30 06:15:00  
6           1          56   2142-04-30 06:45:00
7           1          65   2142-04-30 07:00:00 
8           1          65   2142-04-30 08:00:00
9           2          48   2142-05-04 03:30:00 
10          2          48   2142-05-04 04:00:00
11          2          50   2142-05-04 05:00:00
12          2          69   2142-05-04 06:00:00
13          2          68   2142-05-04 07:00:00
14          2          69   2142-05-04 08:00:00
15          2          50   2142-05-04 09:00:00
16          2          55   2142-05-04 10:00:00
17          2          50   2142-05-04 10:30:00
18          2          67   2142-05-04 11:00:00
19          2          67   2142-05-04 12:00:00

My current attempt uses the lag and lead functions, but I am unsure about how to use these functions when I am unsure how far I need to look ahead. This is an example of looking ahead one value and behind one value. My problem is I do not know how to partition by subjectID to look "t" time points ahead where "t" may be different for every subject.

select t.subjectId, t.didEventOccur,
   (next_timestamp - timestamp) as duration
from (select t.*,
         lag(t.value) over (partition by t.subjectid order by t.timestamp) 
as prev_value,
         lead(t.value) over (partition by t.subjectid order by 
t.timestamp) as next_value,
         lead(t.timestamp) over (partition by t.subjectid order by 
 t.timestamp) as next_timestamp
  from t
 ) t
where value < 60 and next_value < 60 and
  (prev_value is null or prev_value >= 60);

I hope to get an output such as:

SubjectID  DidEventOccur Duration 
 1          1             05:00:00
 2          1             03:30:00
 2          1             03:00:00

Solution

  • A pure SQL solution like you have been asking for:

    SELECT subjectid, start_at, next_end_at - start_at AS duration
    FROM  (
       SELECT *
            , lead(end_at) OVER (PARTITION BY subjectid ORDER BY start_at) AS next_end_at
       FROM  (
          SELECT subjectid, grp, big
               , min(ts) AS start_at
               , max(ts) FILTER (WHERE big AND big_rn = 2) AS end_at  -- 2nd timestamp
          FROM  (
             SELECT subjectid, ts, grp, big
                  , row_number() OVER (PARTITION BY subjectid, grp, big ORDER BY ts) AS big_rn
             FROM  (
                SELECT subjectid, ts
                     , row_number() OVER (PARTITION BY subjectid ORDER BY ts)
                     - row_number() OVER (PARTITION BY subjectid, (value > 60) ORDER BY ts) AS grp
                     , (value > 60) AS big
                FROM   tbl
                ) sub1
             ) sub2
          GROUP  BY subjectid, grp, big
          ) sub3
       ) sub4
    WHERE  NOT big                 -- identifies block of values <= 60 ...
    AND    next_end_at IS NOT NULL -- ...followed by at least 2 values > 60
    ORDER  BY subjectid, start_at;
    

    I omitted the useless column DidEventOccur and added start_at instead. Otherwise exactly your desired result.

    db<>fiddle here

    Consider a procedural solution in plpgsql (or any PL) instead, should be faster. Simpler? I'd say yes, but that depends on who's judging. See (with explanation for the technique and links to more):