Search code examples
mysqldata-warehousemodelingdata-vault

Data vault model: what are hubs good for?


I was just reading about Data Vault modeling and as far as I understand it, the hub does only contain keys (and the record source). So I was wondering why I should create those hub tables, only to store the record source? Wouldn't it be enough to have only Satellites and Links?

Btw: I'm looking for simple mysql tables in a data vault form to download and play with.


Solution

  • The hub is where the passive integration of multiple sources is applied. You would have a column for data source and record all instances of each key as it first arrives in your hub. Example, if I have a CRM system and an ERP system and I sync the data from the CRM system first, then the ERP data comes available. I would add all of the keys from the CRM system, with the data source column value of "CRM". Then when I bring in the ERP system, assuming I have the same structure of keys for the table, I would only add the new keys that only exist in the ERP system with a data-source of "ERP". If the keys are different, you would have to add all of the data from both systems. The point is that you are retaining all of the data from all of the systems in play. When you move to your next layer, be it a Business Data Vault or a Data Mart, you would apply business logic against the hub and satellites according to the "Business Rules" to get to a single resulting row for the two systems where applicable. If you employ the transformation before storing it in this intermediate state, you lose the audit-ability, and the ability to change the business rules at a later date. Make sense?