Search code examples
sqlgoogle-bigquerywindow-functionsgaps-and-islands

Aggregating length of time intervals and grouping to fixed time grid


I have some data consisting of shifts, logging the time periods taken as breaks during the shift.

start_ts                end_ts                  shift_id
2022-01-01T08:31:37Z    2022-01-01T08:58:37Z    1
2022-01-01T08:37:37Z    2022-01-01T09:03:37Z    2
2022-01-01T08:46:37Z    2022-01-01T08:48:37Z    3

I want to map this data to a 15-minute grid, counting how many seconds in total (not per shift) are spent on break during that interval. A solution would look like this:

start_time               end_time               total_break_seconds
2022-01-01T08:30:00Z    2022-01-01T08:45:00Z      1246
2022-01-01T08:45:00Z    2022-01-01T09:00:00Z      1837
2022-01-01T09:00:00Z    2022-01-01T09:15:00Z      217

I know this is a gaps-and-islands style problem, but I'm not sure how to combine this with the mapping to a time grid element. I've looked at using UNIX_SECONDS/time-to-epoch to get the 15-minute intervals, but can't make it out. I'll be working with pretty large tables so ideally I would do as much work as possible before expanding each time interval to the 15-minute grid, but all solutions welcome.

I'm working on BigQuery

Here's a reproducible example to start with:

SELECT
    TIMESTAMP("2022-01-01 08:31:37") AS start_ts,
    TIMESTAMP("2022-01-01 08:58:37") AS end_ts,
    1 as shift_id
  UNION ALL (
    SELECT
      TIMESTAMP("2022-01-01 08:37:37") AS start_ts,
      TIMESTAMP("2022-01-01 09:03:37") AS end_ts,
      2 as shift_id
  )
  UNION ALL (
    SELECT
      TIMESTAMP("2022-01-01 08:46:37") AS start_ts,
      TIMESTAMP("2022-01-01 08:48:37") AS end_ts,
      3 as shift_id
       )

Solution

  • With below query:

    WITH breaks AS (
      SELECT *,
             CASE
               -- for staring break (considering start_ts and end_ts are in same break)
               WHEN break <= start_ts AND end_ts < break + INTERVAL 15 MINUTE THEN TIMESTAMP_DIFF(end_ts, start_ts, SECOND)
               WHEN break <= start_ts THEN 900 - TIMESTAMP_DIFF(start_ts, break, SECOND)
               -- for remaining breaks (considering full break + partial break)
               ELSE IF(DIV(diff, 900) > 0 AND break + INTERVAL 15 MINUTE < end_ts, 900, MOD(diff, 900))
             END AS elapsed
        FROM sample,
             UNNEST(GENERATE_TIMESTAMP_ARRAY(
               TIMESTAMP_TRUNC(start_ts, HOUR), TIMESTAMP_TRUNC(end_ts, HOUR) + INTERVAL 1 HOUR, INTERVAL 15 MINUTE
             )) break,
             UNNEST([TIMESTAMP_DIFF(end_ts, break, SECOND)]) diff
       WHERE break + INTERVAL 15 MINUTE >= start_ts AND break < end_ts
    )
    SELECT break AS start_time, break + INTERVAL 15 MINUTE AS end_time, SUM(elapsed) total_break_seconds 
      FROM breaks
     GROUP BY 1 ORDER BY 1;
    

    Output will be:

    enter image description here