Search code examples
ssisssasdata-warehousesql-server-2012

Is it typical to have the same value for FKs in my fact table across all FK columns?


I'm new to multidimensional data warehousing and have been tasked my workplace in developing a data warehousing solution for reporting purposes, so this might be a stupid question but here it goes ...

Each record in my fact table have FK columns that link out to their respective dimension tables (ex. dimCustomer, dimGeography, dimProduct).

When loading the data warehouse during the ETL process, I first loaded up the dimension tables with the details, then I loaded the fact table and did lookup transformations to find the FK values to put in the fact table. In doing so it seems each row in the fact table has FKs of the same value (ex. row1 has a FK of 1 in each column across the board, row2 has value 2... etc.)

I'm just wondering if this is typical or if I need to rethink the design of the warehouse and ETL process.

Any suggestions would be greatly appreciated.

Thanks


Solution

  • Based on your comments, it sounds like there's a missed step in your ETL process.

    For a call center / contact center, I might start out with a fact table like this:

    CallFactID - unique key just for ETL purposes only
    AssociateID - call center associate who initially took the call
    ProductID - product that the user is calling about
    CallTypeID - General, Complaint, Misc, etc
    ClientID - company / individual that is calling
    CallDateID - linked to your Date (by day) Dimension
    CallTimeOfDayID - bucketed id for call time based on business rules
    CallStartTimestamp - ANSI timestamp of start time
    CallEndTimestamp - ANSI timestamp of end time
    CallDurationTimestamp - INTERVAL data type, or integer in seconds, call duration
    

    Your dimension tables would then be:

    AssociateDim
    ProductDim
    CallTypeDim
    ClientDim
    DateDim
    TimeOfDayDim
    

    Your ETL will need to build the dimensions first. If you have a relational model in your source system, you would typically just go to the "lookup" tables for various things, such as the "Products" table or "Associates" table, and denormalize any relationships that make sense to be included as attributes. For example, a relational product table might look like:

    PRODUCTS: ProductKey, 
              ProductName, 
              ProductTypeKey, 
              ProductManufacturerKey, 
              SKU, 
              UPC
    

    You'd denormalize this into a general product dimension by looking up the product types and manufacturer to end up with something like:

    PRODUCTDIM: PRODUCTID (DW surrogate key), 
                ProductKey, 
                ProductName, 
                ProductTypeDesc, 
                ManufacturerDesc,
                ManufacturerCountry,
                SKU,
                UPC
    

    For attributes that are only on your transaction (call record) tables but are low cardinality, you can create dimensions by doing SELECT DISTINCT on the these tables.

    Once you have loaded all the dimensions, you then load the fact by doing a lookup against each of the dimensions based on the natural keys (which you've preserved in the dimension), and then assign that key to the fact row.

    For a more detailed guide on ETL with DW Star Schemas, I highly recommend Ralph Kimball's book The Data Warehouse ETL Toolkit.