Search code examples
data-modelingdata-warehousesnowflake-cloud-data-platform

Fact table is not connected to dim because of different datasources


I have a fact table for which I am getting data from a source A.

Then, I have source B which has one common field with the source A. So, I wonder how I should populate the fact table with the keys from the dim tables?

It is not just a simple data pull as it requires some serious cleaning. I assume that in the staging tables I should not do any transformation. Does it mean that it would look like the following?

  1. Pull the data from source B to the staging tables.
  2. Do the cleaning and save the results in the pre_dim table(s)
  3. Join the fact table common field with the tables from (2)?
  4. Create dim tables
  5. Link dim tables with the fact table by the surrogate key?

Every day truncate the table in (2) and re-create the new table - see (2)?


Solution

  • You had a right direction, but I am usually separate all levels(staging, etl,dwh) because it's easier to control the data, clean and verify.

    So here's what I would do:

    1. Pull the data from source B and A to the staging tables(st_A and st_B).
    2. Create table etl_A and etl_B with PK dim_B and FK dim_B.
    3. Truncate table and Insert data from source B into etl_B with creating unique PK data in dim_B.
    4. Truncate table and Insert data from source A which joined etl_B ( you can fill FK dim_B now)
    5. And next step maybe partition exchange? Or insert, or bulk. I don't know what you RDBMS you use and your purposes.