In PostgreSQL, I am trying to find subjects that have a sequence of values below 60 followed by two consecutive values above 60 that occur afterwards. I'm also interested in the length of time between the first recorded value below 60 and the second value above 60. This event can occur multiple times for each subject.
I am struggling to find out how to search for an unlimited amount of values < 60 followed by 2 values >= 60.
RowID SubjectID Value TimeStamp
1 1 65 2142-04-29 12:00:00
2 1 58 2142-04-30 03:00:00
3 1 55 2142-04-30 04:00:00
4 1 54 2142-04-30 05:00:00
5 1 55 2142-04-30 06:15:00
6 1 56 2142-04-30 06:45:00
7 1 65 2142-04-30 07:00:00
8 1 65 2142-04-30 08:00:00
9 2 48 2142-05-04 03:30:00
10 2 48 2142-05-04 04:00:00
11 2 50 2142-05-04 05:00:00
12 2 69 2142-05-04 06:00:00
13 2 68 2142-05-04 07:00:00
14 2 69 2142-05-04 08:00:00
15 2 50 2142-05-04 09:00:00
16 2 55 2142-05-04 10:00:00
17 2 50 2142-05-04 10:30:00
18 2 67 2142-05-04 11:00:00
19 2 67 2142-05-04 12:00:00
My current attempt uses the lag and lead functions, but I am unsure about how to use these functions when I am unsure how far I need to look ahead. This is an example of looking ahead one value and behind one value. My problem is I do not know how to partition by subjectID
to look "t" time points ahead where "t" may be different for every subject.
select t.subjectId, t.didEventOccur,
(next_timestamp - timestamp) as duration
from (select t.*,
lag(t.value) over (partition by t.subjectid order by t.timestamp)
as prev_value,
lead(t.value) over (partition by t.subjectid order by
t.timestamp) as next_value,
lead(t.timestamp) over (partition by t.subjectid order by
t.timestamp) as next_timestamp
from t
) t
where value < 60 and next_value < 60 and
(prev_value is null or prev_value >= 60);
I hope to get an output such as:
SubjectID DidEventOccur Duration
1 1 05:00:00
2 1 03:30:00
2 1 03:00:00
A pure SQL solution like you have been asking for:
SELECT subjectid, start_at, next_end_at - start_at AS duration
FROM (
SELECT *
, lead(end_at) OVER (PARTITION BY subjectid ORDER BY start_at) AS next_end_at
FROM (
SELECT subjectid, grp, big
, min(ts) AS start_at
, max(ts) FILTER (WHERE big AND big_rn = 2) AS end_at -- 2nd timestamp
FROM (
SELECT subjectid, ts, grp, big
, row_number() OVER (PARTITION BY subjectid, grp, big ORDER BY ts) AS big_rn
FROM (
SELECT subjectid, ts
, row_number() OVER (PARTITION BY subjectid ORDER BY ts)
- row_number() OVER (PARTITION BY subjectid, (value > 60) ORDER BY ts) AS grp
, (value > 60) AS big
FROM tbl
) sub1
) sub2
GROUP BY subjectid, grp, big
) sub3
) sub4
WHERE NOT big -- identifies block of values <= 60 ...
AND next_end_at IS NOT NULL -- ...followed by at least 2 values > 60
ORDER BY subjectid, start_at;
I omitted the useless column DidEventOccur
and added start_at
instead. Otherwise exactly your desired result.
db<>fiddle here
Consider a procedural solution in plpgsql (or any PL) instead, should be faster. Simpler? I'd say yes, but that depends on who's judging. See (with explanation for the technique and links to more):