Thanks in advance for looking at this.
I have prepared a data warehouse and am populating a customer table with customer information from 2 systems. Customers from one system have the same Business Key as Customers in the other system.
What would be the best way to distinguish between the two so I don't update records don't want to and maintain good data integrity.
I wondered about a system flag but I am unsure.
All suggestions/questions welcome.
I agree with the comment above - the 'source system' key is a composite key from the source system + another string or int identifying the actual source system. This is seperate to the surrogate key mentioned in the previous answers. You really have two keys in your dimension. One is the standard IDENTITY
surrogate key - no suprises there.
The other is a composite key consisting of the key from the source system and an identifier (I actually usually just use a string) that tells you which system it comes from.
So your dimension looks like this:
Customer_SK SRC_Key SRC_System Customer Name
1 5 SAP Jim
2 5 MYOB Joe
As your DW develops and new source systems are introduced you just keep adding SRC_Systems
You can put these in different columns as suggested in another answer but then you end up with this:
Customer_SK SRC_Key_SAP SRC_Key_MYOB Customer Name
1 5 NULL Jim
2 NULL 5 Joe
Which seems a bit wasteful and requires you to add a column everytime a new system comes online.
The important question is: does the same customer exist in both source systems? This design actually allows for merging across rows if they do.
Also make absolutely sure you put a unique constraint on SRC_Key
, SRC_System
as this aids in performance, ensures integrity, and self documents the key.