Search code examples
postgresqlgroup-bytimestampintervals

identify work intervals from timestamps in postgres


I am trying to determine how long a "work session" is based on records with timestamps stored in Postgres. This would be fairly easy in javascript/python/go etc. but I want to do it in postgresql.

If the first record is at 10pm and there are records at least once every hour until 4am then a 10 hour gap then another group of records, I want to group the records from 10pm - 4am into one bucket and the records from 4pm+10 hours - whenever there's another multi-hour gap, into another bucket.

How can the logic, that detects "gaps" to define the bucket borders be written in SQL? There is no fixed "start time" or "stop time".

Some example records

{'2021-05-05 22:00:05', 'user1'}, 
{'2021-05-05 22:25:33', 'user1'}, 
{'2021-05-05 23:33:11', 'user1'},
{'2021-05-06 00:08:34', 'user1'},
{'2021-05-06 00:36:22', 'user1'},
{'2021-05-06 22:15:00', 'user1'},
{'2021-05-06 23:08:00', 'user1'},
{'2021-05-07 00:01:03', 'user1'}

In the above records, one bucket would be from 2021-05-05 22:00:05 - 2021-05-06 00:36:22 because the interval to the next record is more than N hours.


Solution

  • Here is what I wound up doing. I used Postgres lag function to compare records ordered by time with the last record and then calculated the gap between the two records. If the gap is greater than X then it's a new bucket.

    Based on googling "gaps and islands" as suggested by @AdamKG