I have a table MITMAS
that I want to turn into a SCD Type2 that has many columns of which I only want to keep track of changes of 3 columns MMGRP2
,MMGRP3
,MMSTAT
per MMITNO
, so I want to ignore the fields MMTST
and MMASDF
.
The Table and Dataset I work with:
CREATE TABLE tmp.MITMAS (
MMITNO INT,
MMSTAT INT,
MMGRP2 VARCHAR(1),
MMGRP3 VARCHAR(1),
MMTST VARCHAR(2),
MMASDF VARCHAR(2),
ETL_Date DATE
);
INSERT INTO tmp.mitmas (MMITNO, MMSTAT, MMGRP2, MMGRP3, MMTST, MMASDF, ETL_Date)
VALUES
(333556, 5, 'X', 'A', 'T1', 'H1', '2023-02-01'),
(333556, 5, 'X', 'A', 'T2', 'H1', '2023-02-02'),
(333556, 5, 'X', 'A', 'T1', 'H1', '2023-02-03'),
(333556, 5, 'Y', 'A', 'T2', 'H2', '2023-02-04'),
(333556, 6, 'Y', 'A', 'T1', 'H2', '2023-02-05'),
(333556, 6, 'Y', 'A', 'T1', 'H2', '2023-02-06'),
(333556, 6, 'Y', 'A', 'T1', 'H2', '2023-02-07'),
(324261, 5, 'X', 'A', 'T1', 'H1', '2023-02-01'),
(324261, 5, 'Y', 'A', 'T1', 'H2', '2023-02-02'),
(324261, 5, 'Y', 'A', 'T2', 'H2', '2023-02-03'),
(324261, 5, 'Y', 'A', 'T2', 'H1', '2023-02-04'),
(324261, 5, 'Y', 'B', 'T2', 'H2', '2023-02-05'),
(324261, 5, 'X', 'B', 'T1', 'H2', '2023-02-06'),
(324261, 5, 'X', 'B', 'T2', 'H1', '2023-02-07'),
(324261, 6, 'X', 'B', 'T2', 'H2', '2023-02-08'),
(324261, 6, 'X', 'B', 'T2', 'H2', '2023-02-09');
The expected output:
CREATE TABLE tmp.MITMAS_SCD (
MMITNO INT NOT NULL,
MMSTAT INT NOT NULL,
MMGRP2 VARCHAR(50) NOT NULL,
MMGRP3 VARCHAR(50) NOT NULL,
StartDate DATE NOT NULL,
EndDate DATE NOT NULL
);
INSERT INTO tmp.MITMAS_SCD (MMITNO, MMSTAT, MMGRP2, MMGRP3, StartDate, EndDate)
VALUES
(333556, 5, 'X', 'A', '2023-02-01', '2023-02-04'),
(333556, 5, 'Y', 'A', '2023-02-04', '2023-02-05'),
(333556, 6, 'Y', 'A', '2023-02-05', '9999-12-31'),
(324261, 5, 'X', 'A', '2023-02-01', '2023-02-02'),
(324261, 5, 'Y', 'A', '2023-02-02', '2023-02-05'),
(324261, 5, 'Y', 'B', '2023-02-05', '2023-02-06'),
(324261, 5, 'X', 'B', '2023-02-06', '9999-12-31');
I imagine I should think in the following direction:
SELECT
MM.MMITNO,
MM.MMSTAT,
MM.MMGRP2,
MM.MMGRP3,
MM.ETL_Date AS StartDate,
COALESCE(LEAD(MM.ETL_Date) OVER (PARTITION BY MM.MMITNO ORDER BY MM.ETL_Date), '9999-12-31') AS EndDate
FROM tmp.MITMAS MM;
But this obviously doesn't ignore the changes in the MMTST
and MMASDF
columns.
Any suggestions what to adjust?
WITH
partitioned_enumerated AS
(
SELECT
*,
ROW_NUMBER()
OVER (
PARTITION BY MMITNO
ORDER BY ETL_Date
)
AS row_id_MMITNO,
ROW_NUMBER()
OVER (
PARTITION BY MMITNO, MMSTAT, MMGRP2, MMGRP3
ORDER BY ETL_Date
)
AS row_id_MMITNO_STATE
FROM
MITMAS
)
SELECT
MMITNO,
MMSTAT, MMGRP2, MMGRP3,
MIN(ETL_Date),
LEAD(
MIN(ETL_Date)
)
OVER (
PARTITION BY MMITNO
ORDER BY MIN(ETL_Date)
)
FROM
partitioned_enumerated
GROUP BY
MMITNO,
MMSTAT, MMGRP2, MMGRP3,
row_id_MMITNO - row_id_MMITNO_STATE
MMITNO | MMSTAT | MMGRP2 | MMGRP3 | (No column name) | (No column name) |
---|---|---|---|---|---|
324261 | 5 | X | A | 2023-02-01 | 2023-02-02 |
324261 | 5 | Y | A | 2023-02-02 | 2023-02-05 |
324261 | 5 | Y | B | 2023-02-05 | 2023-02-06 |
324261 | 5 | X | B | 2023-02-06 | 2023-02-08 |
324261 | 6 | X | B | 2023-02-08 | null |
333556 | 5 | X | A | 2023-02-01 | 2023-02-04 |
333556 | 5 | Y | A | 2023-02-04 | 2023-02-05 |
333556 | 6 | Y | A | 2023-02-05 | null |