Search code examples
sqlsql-servert-sqlgaps-and-islands

From single Date column to Date Ranges, Ignoring certain columns


I have a table MITMAS that I want to turn into a SCD Type2 that has many columns of which I only want to keep track of changes of 3 columns MMGRP2,MMGRP3,MMSTAT per MMITNO, so I want to ignore the fields MMTST and MMASDF.

The Table and Dataset I work with:

CREATE TABLE tmp.MITMAS (
  MMITNO INT,
  MMSTAT INT,
  MMGRP2 VARCHAR(1),
  MMGRP3 VARCHAR(1),
  MMTST VARCHAR(2),
  MMASDF VARCHAR(2),
  ETL_Date DATE
);

INSERT INTO tmp.mitmas (MMITNO, MMSTAT, MMGRP2, MMGRP3, MMTST, MMASDF, ETL_Date)
VALUES 
(333556, 5, 'X', 'A', 'T1', 'H1', '2023-02-01'),
(333556, 5, 'X', 'A', 'T2', 'H1', '2023-02-02'),
(333556, 5, 'X', 'A', 'T1', 'H1', '2023-02-03'),
(333556, 5, 'Y', 'A', 'T2', 'H2', '2023-02-04'),
(333556, 6, 'Y', 'A', 'T1', 'H2', '2023-02-05'),
(333556, 6, 'Y', 'A', 'T1', 'H2', '2023-02-06'),
(333556, 6, 'Y', 'A', 'T1', 'H2', '2023-02-07'),
(324261, 5, 'X', 'A', 'T1', 'H1', '2023-02-01'),
(324261, 5, 'Y', 'A', 'T1', 'H2', '2023-02-02'),
(324261, 5, 'Y', 'A', 'T2', 'H2', '2023-02-03'),
(324261, 5, 'Y', 'A', 'T2', 'H1', '2023-02-04'),
(324261, 5, 'Y', 'B', 'T2', 'H2', '2023-02-05'),
(324261, 5, 'X', 'B', 'T1', 'H2', '2023-02-06'),
(324261, 5, 'X', 'B', 'T2', 'H1', '2023-02-07'),
(324261, 6, 'X', 'B', 'T2', 'H2', '2023-02-08'),
(324261, 6, 'X', 'B', 'T2', 'H2', '2023-02-09');

The expected output:

CREATE TABLE tmp.MITMAS_SCD (
  MMITNO INT NOT NULL,
  MMSTAT INT NOT NULL,
  MMGRP2 VARCHAR(50) NOT NULL,
  MMGRP3 VARCHAR(50) NOT NULL,
  StartDate DATE NOT NULL,
  EndDate DATE NOT NULL
);
INSERT INTO tmp.MITMAS_SCD (MMITNO, MMSTAT, MMGRP2, MMGRP3, StartDate, EndDate)
VALUES 
(333556, 5, 'X', 'A', '2023-02-01', '2023-02-04'),
(333556, 5, 'Y', 'A', '2023-02-04', '2023-02-05'),
(333556, 6, 'Y', 'A', '2023-02-05', '9999-12-31'),
(324261, 5, 'X', 'A', '2023-02-01', '2023-02-02'),
(324261, 5, 'Y', 'A', '2023-02-02', '2023-02-05'),
(324261, 5, 'Y', 'B', '2023-02-05', '2023-02-06'),
(324261, 5, 'X', 'B', '2023-02-06', '9999-12-31');

I imagine I should think in the following direction:

SELECT
  MM.MMITNO,
  MM.MMSTAT,
  MM.MMGRP2,
  MM.MMGRP3,
  MM.ETL_Date AS StartDate,
  COALESCE(LEAD(MM.ETL_Date) OVER (PARTITION BY MM.MMITNO ORDER BY MM.ETL_Date), '9999-12-31') AS EndDate
FROM tmp.MITMAS MM;

But this obviously doesn't ignore the changes in the MMTST and MMASDF columns.

Any suggestions what to adjust?


Solution

  • WITH
      partitioned_enumerated AS
    (
      SELECT
        *,
        ROW_NUMBER()
          OVER (
            PARTITION BY MMITNO
                ORDER BY ETL_Date
          )
            AS row_id_MMITNO,
        ROW_NUMBER()
          OVER (
            PARTITION BY MMITNO, MMSTAT, MMGRP2, MMGRP3
                ORDER BY ETL_Date
          )
            AS row_id_MMITNO_STATE
      FROM
        MITMAS
    )
    SELECT
      MMITNO,
      MMSTAT, MMGRP2, MMGRP3,
      MIN(ETL_Date),
      LEAD(
        MIN(ETL_Date)
      )
        OVER (
          PARTITION BY MMITNO
              ORDER BY MIN(ETL_Date)
        )
    FROM
      partitioned_enumerated
    GROUP BY
      MMITNO,
      MMSTAT, MMGRP2, MMGRP3,
      row_id_MMITNO - row_id_MMITNO_STATE
    
    MMITNO MMSTAT MMGRP2 MMGRP3 (No column name) (No column name)
    324261 5 X A 2023-02-01 2023-02-02
    324261 5 Y A 2023-02-02 2023-02-05
    324261 5 Y B 2023-02-05 2023-02-06
    324261 5 X B 2023-02-06 2023-02-08
    324261 6 X B 2023-02-08 null
    333556 5 X A 2023-02-01 2023-02-04
    333556 5 Y A 2023-02-04 2023-02-05
    333556 6 Y A 2023-02-05 null

    fiddle