Search code examples
sqlssisdata-warehousedata-integrityscd

SSIS Business Key Duplicates (more than one LOBS)


Thanks in advance for looking at this.

I have prepared a data warehouse and am populating a customer table with customer information from 2 systems. Customers from one system have the same Business Key as Customers in the other system.

What would be the best way to distinguish between the two so I don't update records don't want to and maintain good data integrity.

I wondered about a system flag but I am unsure.

All suggestions/questions welcome.


Solution

  • I agree with the comment above - the 'source system' key is a composite key from the source system + another string or int identifying the actual source system. This is seperate to the surrogate key mentioned in the previous answers. You really have two keys in your dimension. One is the standard IDENTITY surrogate key - no suprises there.

    The other is a composite key consisting of the key from the source system and an identifier (I actually usually just use a string) that tells you which system it comes from.

    So your dimension looks like this:

    Customer_SK    SRC_Key    SRC_System    Customer Name
    1              5          SAP           Jim       
    2              5          MYOB          Joe
    
    • Your ETL code from MYOB > DW knows to only look at MYOB data
    • Your ETL code from SAP > DW knows to only look at SAP data
    • Your data warehouse only uses the Customer_SK surrogate key

    As your DW develops and new source systems are introduced you just keep adding SRC_Systems

    You can put these in different columns as suggested in another answer but then you end up with this:

    Customer_SK    SRC_Key_SAP     SRC_Key_MYOB    Customer Name
    1              5               NULL            Jim       
    2              NULL            5               Joe
    

    Which seems a bit wasteful and requires you to add a column everytime a new system comes online.

    The important question is: does the same customer exist in both source systems? This design actually allows for merging across rows if they do.

    Also make absolutely sure you put a unique constraint on SRC_Key, SRC_System as this aids in performance, ensures integrity, and self documents the key.