Search code examples
sqloracle-databasegaps-and-islands

Detect biggest date overlap entity


I want to find the maximum concurrent capacity of each instructor. If there is 1 day overlap between two course durations, then they are considered to be concurrent. I want the output as in column F.

Is there a way to achieve this output in Oracle SQL?

instructor schedule

Script to create the data-

create table instructor_schedule(instructor_id varchar2(5), course_id varchar2(5), course_start_dt date, course_end_dt date);

insert into instructor_schedule(instructor_id, course_id, course_start_dt, course_end_dt) values('I1', 'C1', to_date('01-JAN-2022', 'DD-MON-YYYY'), to_date('30-JAN-2022', 'DD-MON-YYYY'));
insert into instructor_schedule(instructor_id, course_id, course_start_dt, course_end_dt) values('I1', 'C2', to_date('25-DEC-2021', 'DD-MON-YYYY'), to_date('15-JAN-2022', 'DD-MON-YYYY'));
insert into instructor_schedule(instructor_id, course_id, course_start_dt, course_end_dt) values('I1', 'C3', to_date('25-JAN-2022', 'DD-MON-YYYY'), to_date('05-FEB-2022', 'DD-MON-YYYY'));
insert into instructor_schedule(instructor_id, course_id, course_start_dt, course_end_dt) values('I1', 'C4', to_date('26-JAN-2022', 'DD-MON-YYYY'), to_date('26-JAN-2022', 'DD-MON-YYYY'));
insert into instructor_schedule(instructor_id, course_id, course_start_dt, course_end_dt) values('I1', 'C5', to_date('01-MAR-2022', 'DD-MON-YYYY'), to_date('05-MAR-2022', 'DD-MON-YYYY'));

insert into instructor_schedule(instructor_id, course_id, course_start_dt, course_end_dt) values('I2', 'C1', to_date('20-AUG-2022', 'DD-MON-YYYY'), to_date('22-AUG-2022', 'DD-MON-YYYY'));
insert into instructor_schedule(instructor_id, course_id, course_start_dt, course_end_dt) values('I2', 'C2', to_date('03-SEP-2022', 'DD-MON-YYYY'), to_date('04-SEP-2022', 'DD-MON-YYYY'));
insert into instructor_schedule(instructor_id, course_id, course_start_dt, course_end_dt) values('I2', 'C3', to_date('02-SEP-2022', 'DD-MON-YYYY'), to_date('02-SEP-2022', 'DD-MON-YYYY'));
insert into instructor_schedule(instructor_id, course_id, course_start_dt, course_end_dt) values('I2', 'C4', to_date('01-SEP-2022', 'DD-MON-YYYY'), to_date('05-SEP-2022', 'DD-MON-YYYY'));

Solution

  • You may use scalar subquery with the condition on intersecting intervals:

    select
      m.*
      , (
        select count(*)
        from instructor_schedule lkp
        where m.instructor_id = lkp.instructor_id
          and m.course_id != lkp.course_id
          and m.course_start_dt <= lkp.course_end_dt
          and lkp.course_start_dt <= m.course_end_dt
      ) as intersects
    from instructor_schedule m
    
    INSTRUCTOR_ID COURSE_ID COURSE_START_DT COURSE_END_DT INTERSECTS
    I1 C1 2022-01-01 2022-01-30 3
    I1 C2 2021-12-25 2022-01-15 1
    I1 C3 2022-01-25 2022-02-05 2
    I1 C4 2022-01-26 2022-01-26 2
    I1 C5 2022-03-01 2022-03-05 0
    I2 C1 2022-08-20 2022-08-22 0
    I2 C2 2022-09-03 2022-09-04 1
    I2 C3 2022-09-02 2022-09-02 1
    I2 C4 2022-09-01 2022-09-05 2

    fiddle

    UPD

    Alternatively you may use a single table scan and model clause to perform calculations over multiple rows. cv below stands for current value of a specified dimension and brackets [...] are used to reference dimension values.

    select /*+gather_plan_statistics*/
      instructor_id,
      course_id, course_start_dt, course_end_dt,
      intersects,
      concurrent_courses,
      max_cap_flg
        
    from instructor_schedule m
    model
      partition by (instructor_id)
      dimension by (course_id, course_start_dt, course_end_dt)
      measures(
        0 as intersects,
        cast(null as varchar(1000)) as concurrent_courses,
        course_id as dummy,
        0 as max_cap_flg
      )
      rules update  sequential order(
        intersects[any, any, any]
          = count(intersects)[
              course_id != cv(course_id),
              course_start_dt <= cv(course_end_dt),
              course_end_dt >= cv(course_start_dt)
            ],
        concurrent_courses[any, any, any]
          = listagg(dummy, ',') within group(order by null)[
              course_id != cv(course_id),
              course_start_dt <= cv(course_end_dt),
              course_end_dt >= cv(course_start_dt)
            ],
        max_cap_flg[any,any,any]
          = case
              when intersects[cv(), cv(), cv()] = max(intersects)[any, any, any]
              then 1
            end
      )
    
    INSTRUCTOR_ID COURSE_ID COURSE_START_DT COURSE_END_DT INTERSECTS CONCURRENT_COURSES MAX_CAP_FLG
    I1 C1 2022-01-01 2022-01-30 3 C4,C3,C2 1
    I1 C2 2021-12-25 2022-01-15 1 C1 null
    I1 C3 2022-01-25 2022-02-05 2 C4,C1 null
    I1 C4 2022-01-26 2022-01-26 2 C3,C1 null
    I1 C5 2022-03-01 2022-03-05 0 null null
    I2 C1 2022-08-20 2022-08-22 0 null null
    I2 C2 2022-09-03 2022-09-04 1 C4 null
    I2 C3 2022-09-02 2022-09-02 1 C4 null
    I2 C4 2022-09-01 2022-09-05 2 C3,C2 1

    fiddle