Search code examples
sqloracleplsqlgroup-bygrouping

Grouping of Sub-Tables while Maintaining Ranges of Time


I'm working with a table that maintains a history of records for each key, alongside a START_DATE and END_DATE field that signifies when the record changes go in and out of action, respectively:

KEY START_DATE END_DATE FIELD_ONE FIELD_TWO
1 2000-01-01 2000-01-02 10 20
1 2000-01-03 2000-01-04 10 21
1 2000-01-05 2000-01-06 11 21
1 2000-01-07 2000-01-08 10 20
1 2000-01-09 2000-01-10 10 21

FIELD_TWO is not relevant for my purposes, and so I'd like to remove it in my extract. I end up with a table like this, in doing so:

KEY START_DATE END_DATE FIELD_ONE
1 2000-01-01 2000-01-02 10
1 2000-01-03 2000-01-04 10
1 2000-01-05 2000-01-06 11
1 2000-01-07 2000-01-08 10
1 2000-01-09 2000-01-10 10

But now, I have records that are pretty much duplicates of one another - the START_DATE and END_DATE fields are signifying a change that cannot be seen in my sub-table. And so I'd like to "condense" the table to the following result:

KEY START_DATE END_DATE FIELD_ONE
1 2000-01-01 2000-01-04 10
1 2000-01-05 2000-01-06 11
1 2000-01-07 2000-01-10 10

How can this be achieved?

I, naively, tried to use a GROUP BY over KEY and FIELD_ONE with a MIN() and MAX() function applied to the START_DATE and END_DATE fields, but this does not work for the above example, where FIELD_ONE changes to and from one value (10, in this example). You, of course, end up with the following:

KEY START_DATE END_DATE FIELD_ONE
1 2000-01-01 2000-01-10 10
1 2000-01-05 2000-01-06 11

I'm working with PL/SQL, but can make the necessary translations from other SQL flavours as long as PL/SQL has the same tools.

Thank you!


Solution

  • This is a Gaps & Island problem. You can use the traditional solution:

    select key,
      min(start_date) as start_date,
      max(end_date) as end_date, 
      field_one 
    from (
      select
        x.*, 
        sum(case when prev_one = field_one and prev_end + 1 = start_date
            then 0 else 1 end) over(partition by key order by start_date) as g
      from (
        select t.*,
          lag(field_one) over(partition by key order by start_date) as prev_one,
          lag(end_date) over(partition by key order by start_date) as prev_end
        from t
      ) x
    ) y
    group by key, g, field_one
    

    Result:

     KEY  START_DATE  END_DATE   FIELD_ONE 
     ---- ----------- ---------- --------- 
     1    01-JAN-00   04-JAN-00  10        
     1    05-JAN-00   06-JAN-00  11        
     1    07-JAN-00   10-JAN-00  10        
    

    See running example at db<>fiddle.