I'm working with a table that maintains a history of records for each key, alongside a START_DATE
and END_DATE
field that signifies when the record changes go in and out of action, respectively:
KEY | START_DATE | END_DATE | FIELD_ONE | FIELD_TWO |
---|---|---|---|---|
1 | 2000-01-01 | 2000-01-02 | 10 | 20 |
1 | 2000-01-03 | 2000-01-04 | 10 | 21 |
1 | 2000-01-05 | 2000-01-06 | 11 | 21 |
1 | 2000-01-07 | 2000-01-08 | 10 | 20 |
1 | 2000-01-09 | 2000-01-10 | 10 | 21 |
FIELD_TWO
is not relevant for my purposes, and so I'd like to remove it in my extract. I end up with a table like this, in doing so:
KEY | START_DATE | END_DATE | FIELD_ONE |
---|---|---|---|
1 | 2000-01-01 | 2000-01-02 | 10 |
1 | 2000-01-03 | 2000-01-04 | 10 |
1 | 2000-01-05 | 2000-01-06 | 11 |
1 | 2000-01-07 | 2000-01-08 | 10 |
1 | 2000-01-09 | 2000-01-10 | 10 |
But now, I have records that are pretty much duplicates of one another - the START_DATE
and END_DATE
fields are signifying a change that cannot be seen in my sub-table. And so I'd like to "condense" the table to the following result:
KEY | START_DATE | END_DATE | FIELD_ONE |
---|---|---|---|
1 | 2000-01-01 | 2000-01-04 | 10 |
1 | 2000-01-05 | 2000-01-06 | 11 |
1 | 2000-01-07 | 2000-01-10 | 10 |
How can this be achieved?
I, naively, tried to use a GROUP BY
over KEY
and FIELD_ONE
with a MIN()
and MAX()
function applied to the START_DATE
and END_DATE
fields, but this does not work for the above example, where FIELD_ONE
changes to and from one value (10, in this example). You, of course, end up with the following:
KEY | START_DATE | END_DATE | FIELD_ONE |
---|---|---|---|
1 | 2000-01-01 | 2000-01-10 | 10 |
1 | 2000-01-05 | 2000-01-06 | 11 |
I'm working with PL/SQL, but can make the necessary translations from other SQL flavours as long as PL/SQL has the same tools.
Thank you!
This is a Gaps & Island problem. You can use the traditional solution:
select key,
min(start_date) as start_date,
max(end_date) as end_date,
field_one
from (
select
x.*,
sum(case when prev_one = field_one and prev_end + 1 = start_date
then 0 else 1 end) over(partition by key order by start_date) as g
from (
select t.*,
lag(field_one) over(partition by key order by start_date) as prev_one,
lag(end_date) over(partition by key order by start_date) as prev_end
from t
) x
) y
group by key, g, field_one
Result:
KEY START_DATE END_DATE FIELD_ONE
---- ----------- ---------- ---------
1 01-JAN-00 04-JAN-00 10
1 05-JAN-00 06-JAN-00 11
1 07-JAN-00 10-JAN-00 10
See running example at db<>fiddle.