Search code examples
data-warehousedata-vault

Data Vault: difference between business, surrogate, retained keys


Surrogate and business keys are mentioned in wikipedia article about Data Vault.enter link description here. Retained keys are mentioned here.

  1. According to wikipedia article it is not mandatory to specify the surrogate keys for the hub/satellite or link. However, in the link it is mandatory to specify the surrogate keys for the hub it connects. If it is not mandatory to specify the surrogate keys in hubs, how will I enter them in the link table?

  2. Surrogate keys vs business keys. Surrogate keys are needed to connect the structures. Business keys are used to uniquely identify an item. But we could do it also via surrogate key, couldn't we? What is the main difference between the business key and a surrogate key and why cannot I just use a surrogate key as a businees key or vice versa?

  3. In later article they write:

Hub table – contains business key and retained key. The retained key is a key which is mapped to business key one-to-one. In comparison, the surrogate key includes time and there can be many surrogate keys corresponding to one business key. This explains the name of the key, it is retained with insertion of a new version of a row while surrogate key is increasing. Why we need the retained key? There can be many sources, so we cannot use business key as it may have different type or different number of attributes in different source systems.

Because of many resources, there might be different business keys for the same item. But this specific item will be uniquely identified by a retained key. Did I understand it right? So, that mean that a surrogate key maps to different business keys and not one-to-one.

Example. Lets say we have a hub "Employee" with two different business keys coming from two different departments. There is an employee A. From Accounting department employee A will have a business key AT654 and from Project department he will have a business key PR454. The hub will have them two entries for this employer:

RetainedKey   BK      TimeStamp    Source
AAA111        AT654   24.03.2012   Accounting
AAA111        PR454   25.03.2012   Project

But how can I incorporate a surrogate key? Namely, in which situation it will be increased (as stated in "This explains the name of the key,it is retained with insertion of a new version of a row whilesurrogate key is increasing") ?

Could somebody give me an example (situation) of usage of retained keys, business keys and surrogate keys?


Solution

  • There are three main terms in Data Vault 2.0:

    • Business Key: The key how the business identifies an object.
    • Surrogate Key: The key how the business identifies an object if no direct business key is available. This is a combined key out of several fields making a row unique.
    • Hash Key: The keys used in Hubs, Satellites and Links to join tables; they are generated out of the business / surrogate keys.
    • Retained Key must be connected to Data Vault 1.0. I've never heard of it.

    If you need to describe the fact that you might have a business key which is different in several systems but means the same then one would use Same-As-Links tables for storage and lookup.

    So to join your two systems you'll want to create a Same-As-Link structure, which would look like this:

    Hash Key        Accounting_Key     Project_Key     Timestamp
    xyzabc...       AT654              PR454           24.05.2012
    

    Then this table can be used as a lookup table.