Search code examples
sqlpostgresqlstored-proceduresoverlapgaps-and-islands

Finding Gaps in Timestamps with Multiple Users and Overlapping Timeranges in PostgreSQL


This is a continuation of a previous post on this site: Finding Gaps in Timestamps for Multiple Users in PostgreSQL

I am working with a dataset containing Check-In and Check-Out times for multiple office rooms over the last 5 years. One of the projects I was asked to work on was calculating the amount of time each room is busy and vacant over various time ranges (daily, weekly, monthly, etc.) assuming set operational hours (7:30am to 5pm). Deviating from my last post, there are instances of overlapping time ranges. A sample of the dataset for one day looks like this:

room_id     check_in                check_out
"Room D"    "2014-07-18 12:23:00"   "2014-07-18 12:54:00"
"Room D"    "2014-07-19 09:16:00"   "2014-07-19 10:30:00"
"Room D"    "2014-07-19 09:10:00"   "2014-07-19 10:30:00"
"Room D"    "2014-07-18 08:45:00"   "2014-07-18 22:40:00"
"Room 5"    "2014-07-19 10:20:00"   "2014-07-19 12:20:00"
"Room 5"    "2014-07-18 07:59:00"   "2014-07-18 09:00:00"
"Room 5"    "2014-07-18 09:04:00"   "2014-07-18 14:00:00"
"Room 5"    "2014-07-18 07:59:00"   "2014-07-18 10:00:00"

From my previous post, I was very helpfully provided with this snippet of code which works perfectly for all instances where there is no overlap as pointed out by the author:

select date_trunc('day', start_dt), room_id,
       sum( least(extract(epoch from end_dt), v.epoch2) - 
            greatest(extract(epoch from start_dt), epoch1)
          ) as busy_seconds,
       (epoch2 - epoch1 -
        sum( least(extract(epoch from end_dt), v.epoch2) - 
             greatest(extract(epoch from start_dt), epoch1)
           )
       ) as free_seconds
from rooms r cross join
     (values (extract(epoch from date_trunc('day', start_dt) + interval '7 hours 30 minutes'),
              extract(epoch from date_trunc('day', start_dt) + interval '17 hour')
             )
     ) v(epoch1, epoch2)                  
group by date_trunc('day', start_dt), room_id

However, after digging through our data, there are more instances of overlapping time ranges than I expected. Here is the target output I would like to retrieve from the sample data above:

target_day      room_id         busy_time         Free Time
2014-07-18      Room D          8.25              1.25
2014-07-19      Room 4          1.33              8.17
2014-07-18      Room 5          8                 1.5
2014-07-19      Room 5          2                 7.5

I am learning PostgreSQL right now so this problem is a little over my head. Any help or guidance would be greatly appreciated!


Solution

  • To handle gaps, I would recommend combining them first -- say using a CTE. The following does the logic by:

    • Looking at the maximum end date before a given row (for the same room and same time.
    • Doing a cumulative sum where there is a gap between the previous maximum end date and the start date.
    • Using this to aggregate by room_id to calculate new start and end times.

    This should work, but you can validate the CTE before applying the logic in your other query (the only change is referring to the CTE rather than the base table).

    As a query:

    with r as (
          select room_id, min(start_dt) as start_dt, max(end_dt) as end_ddt
          from (select r.*,
                       count(*) over (filter where prev_end_dt < start_dt) over (partition by room_id date_trunc('day', start_dt) order by start_dt) as grp
                from (select r.*,
                             max(end_dt) over (partition by room_id, date_trunc('day', start_dt) rows between unbounded preceding and 1 preceding) as prev_end_dt
                      from rooms r
                     ) r
               ) r
          group by room_id, grp
         )
    select date_trunc('day', start_dt), room_id,
           sum( least(extract(epoch from end_dt), v.epoch2) - 
                greatest(extract(epoch from start_dt), epoch1)
              ) as busy_seconds,
           (epoch2 - epoch1 -
            sum( least(extract(epoch from end_dt), v.epoch2) - 
                 greatest(extract(epoch from start_dt), epoch1)
               )
           ) as free_seconds
    from r cross join
         (values (extract(epoch from date_trunc('day', start_dt) + interval '7 hours 30 minutes'),
                  extract(epoch from date_trunc('day', start_dt) + interval '17 hour')
                 )
         ) v(epoch1, epoch2)                  
    group by date_trunc('day', start_dt), room_id