I retrieve data from SCD-2 table with many parameters and I need to build my own SCD-2 with only one of them. Therefore, I need to get rid of excessive intervals. Please recommend an algorithm to perform that in the best way.
What I receive from the source table:
I need to transform it to:
You can use following steps to get the required result. Of course you can do it all in one step with sub-selects or CTEs, but for better traceability I prefere temporary tables.
DROP TABLE IF EXISTS #source;
CREATE TABLE #source (key1 integer, value1 integer, row_actual_from date, row_actual_to date);
INSERT INTO #source
VALUES
(19999923, 15, '2020-01-01', '2020-01-02'),
(19999923, 15, '2020-01-03', '2020-01-05'),
(19999923, 15, '2020-01-06', '2020-01-08'),
(19999923, 11, '2020-01-09', '2020-01-12'),
(19999923, 3434, '2020-01-13', '2020-01-15'),
(19999923, 11, '2020-01-16', '2020-01-20'),
(19999923, 15, '2020-01-21', '2020-02-02'),
(19999923, 3434, '2020-02-03', '2020-02-10'),
(19999923, 3434, '2020-02-11', '2020-02-19'),
(19999923, 3434, '2020-02-20', '2020-02-25'),
(19999923, 99, '2020-02-26', '9999-12-31');
Step 1: Identify start and end for a single value period.
Note that in LAG/LEAD it is essentally to have a value as NULL replacement (-99 in example) which dosn't match with the possible values in the column.
DROP TABLE IF EXISTS #step1;
SELECT
key1, value1, row_actual_from, row_actual_to
, period_start = CASE WHEN LAG(value1, 1, -99) OVER (PARTITION BY key1 ORDER BY row_actual_from) <> value1 THEN 1 ELSE 0 END
, period_end = CASE WHEN LEAD(value1, 1, -99) OVER (PARTITION BY key1 ORDER BY row_actual_from) <> value1 THEN 1 ELSE 0 END
INTO #step1
FROM #source
ORDER BY key1, row_actual_from;
Step 2: Filter on start/end rows and assign row_actual_to of end to start.
If the period of a value has only one row this row has period_start and period_end set to 1 and therefore the sum is 2. In this case the content of row_acutal_to has already the wanted value.
DROP TABLE IF EXISTS #step2;
SELECT
key1, value1, row_actual_from, row_actual_to, period_start, period_end
, valid_from = row_actual_from
, valid_to = CASE (period_start + period_end)
WHEN 1 THEN LEAD(row_actual_to, 1) OVER (PARTITION BY key1, value1 ORDER BY row_actual_from)
WHEN 2 THEN row_actual_to ELSE NULL END
INTO #step2
FROM #step1
WHERE (period_start + period_end) > 0
ORDER BY key1, row_actual_from;
Step 3: Filter (adjusted) start row of value period.
SELECT key1, value1, valid_from, valid_to
FROM #step2
WHERE period_start = 1
ORDER BY key1, row_actual_from;