Search code examples
mysqlsqlsql-servert-sqlscd

Generic procedure to perform SCD in sql


I have 2 tables in mssql server.I can perform scd through custom insert/update/delete and also through Merge statement.

I want to know that is there any generic procedure that could server the purpose. we just pass it 2 tables and it should porform the SCD. any option in SQL server 2008? Thanks


Solution

  • No, there isn't and there can't be a generic one suitable for no matter what tables you pass to it. For several reasons:

    • How do you know which SCD type? (Okay, could be another parameter, but...)
    • How do you know which column should be historicized and which should be overwritten?
    • How do you determine which column is the business key, the surrogate key, the expiration column and so on?
    • To specify the columns in an update statement you must write dynamic sql, which is possible, but the above point comes into play

    Not a reason why it's not possible but also consider: For a proper UPSERT one usually works with temporary tables, the MERGE statement sucks for SCDs except in special cases. That is because you can't use a MERGE statement together with an INSERT/UPDATE and you would have to disable foreign keys for that, since an UPDATE is implemented as DELETE THEN INSERT (or something like that, don't remember clearly, but I had those problems when I tried).

    I prefer doing it this way (SCD type 2 and SQL Server that is):

    Step 1:

    IF EXISTS (
    SELECT * FROM sys.objects
    WHERE name = 'tmpDimSource')
    DROP TABLE tmpDimSource;
    SELECT
    *
    INTO tmpDimSource
    FROM
    (
    SELECT whatever
    FROM yourTable
    );
    

    Step 2:

    IF EXISTS (
    SELECT * FROM sys.objects
    WHERE name = 'tmpDimYourDimensionName')
    DROP TABLE tmpDimYourDimensionName;
    
    SELECT * INTO tmpDimYourDimensionName FROM D_yourDimensionName WHERE 1 = 0;
    INSERT INTO tmpDimYourDimensionName 
    (
    sid, /*a surrogate id column*/
    theColumnsYouNeedInYourDimension,
    validFrom
    )
    SELECT 
    ISNULL(d.sid, 0),
    ds.theColumnsYouNeedInYourDimension,
    DATEADD(DAY, DATEDIFF(DAY, 0, GETDATE()), 0) /*the current date*/
    FROM
    tmpDimSource ds 
    LEFT JOIN D_yourDimensionName d ON ds.whateverId = c.whateverId
    ;
    

    The ISNULL(d.sid, 0) in step 2 is important. It returns the surrogate id of your dimension, if an entry already exists, otherwise 0.

    Step 3:

    UPDATE D_yourDimensionName SET 
    validTo = DATEADD(DAY, DATEDIFF(DAY, 0, GETDATE()) - 1, 0) /*yesterday*/
    FROM 
    D_yourDimensionName d
    INNER JOIN tmpDimYourDimensionName t ON d.sid = t.sid
    WHERE t.sid <> 0 AND
    (
    d.theColumnWhichHasChangedAndIsImportant <> t.theColumnWhichHasChangedAndIsImportant OR
    d.anotherColumn <> t.anotherColumn 
    )
    ;
    

    In Step 3 you mark the existing entry as not valid anymore and keep a history of it. The valid entry you get with WHERE validTo IS NULL.

    You can also add another UPDATE to overwrite any other column with the new value if needed.

    Step 4:

    INSERT INTO D_yourDimensionName 
    SELECT * FROM tmpDimYourDimensionName 
    WHERE sid = 0;
    

    And that's it.