Search code examples
sqldata-warehousedimensional-modelingstar-schemascd

Am I implementing SCD type 1 & 7 correctly


SCD type 1

Suppose I've built SCD type 1 over the following data that comes from operational system:

ID | CHANNEL_CODE | NAME | TYPE
1  |      A       |  X   |  0
2  |      B       |  Y   |  1

Because, Surrogate Keys are preferable even for SCD type 1, we're throwing away ID column and generating SRK from natural key (CHANNEL_CODE):

SRK | CHANNEL_CODE | NAME | TYPE
11  |      A       |  X   |  0
12  |      B       |  Y   |  1

Means CHANNEL_CODE is expected to never change, in case of NAME or TYPE update - overwrite happens.

Is this the right standard implementation of SCD type 1?

SCD 1 + durable key

Natural keys may be changed because of sim or credit card change, duplicates, integration of source systems, business reasons, etc. From Kimball's Design Tip #147, I know the issue is solved with durable srk.

Means, operational system have to send me an event like: "From now CHANNEL_CODE=A is CHANNEL_CODE=C". So I should have the following data (fact table contains both srks):

DURABLE_SRK | SRK | CHANNEL_CODE | NAME | TYPE
    11      | 11  |      A       |  X   |  0
    12      | 12  |      B       |  Y   |  1
    11      | 13  |      C       |  X   |  0

Still change to NAME or TYPE columns will result in simple overwrite (no new rows).

Shall NAME be overwritten here by SRK or DURABLE_SRK? Is it still an SCD 1?

SCD type 7

In my understanding, from Kimball's Design Tip #152, SCD 7 = SCD 1 + durable key + SCD 2 (history for not natural key columns). So SCD type 7 should generate a new row on every column update. For example, on NAME update from X to Z where CHANNEL_CODE=C:

DURABLE_SRK | SRK | CHANNEL_CODE | NAME | TYPE | EFFECTIVE_START_DATE | EXPIDATION_DATE | IS_CURRENT_IND
    11      | 11  |      A       |  X   |  0   |      2020-05-02      |    2020-06-12   | False
    12      | 12  |      B       |  Y   |  1   |      2020-01-12      |    2100-01-01   | True
    11      | 13  |      C       |  X   |  0   |      2020-06-12      |    2020-08-15   | False
    11      | 13  |      C       |  Z   |  0   |      2020-08-15      |    2100-01-01   | True

Is this correct implementation of SCD type 7?


Solution

  • SCD TYPE1

    Yes, that's correct though there is no need to throw away ID and I would probably keep it as it may help in your ETL and for debugging purposes as it allows the corresponding record in the source system to be easily identified (see next paragraph for an example).

    SCD 1 + Durable Key

    If this is meant to be SCD1 then your example is incorrect. If the channel code on the same source record has changed then it would overwrite the record in the Dimension table, not insert a new record. This is a good example of why you should retain the ID as it makes it obvious how the records in your dimension relate to your source. For an SCD1, a SK and a Durable SK are, almost by definition, the same thing.

    I realise that your examples may be simplified compared to a real-world scenario but I would suggest that Channel Code is a genuine natural key and therefore would never change: a different Channel Code would imply a different record. A natural key only really changes when there is no genuine unique business identifier in the source record e.g. a person might have a genuine unique identifier such as social security number (which never changes) but if that wasn't available they might be identified by first name, last name and email address - any of which might change and therefore aren't genuine natural keys - and this would be a good case for including a Durable SK.

    SCD Type 7

    For this type, the Dimension table is entirely SCD Type 2 and includes a Durable SK. The SCD1 aspect can be thought of as virtual, as it is implemented as a View over the Dimension where the Current Flag = True. Any Fact table joining to this table has two FKs - one that holds the Dimension SK for the row applicable at the time of the event (standard SCD2 logic) and one that holds the Durable SK and references the View (to get the SCD1-like record)