Search code examples
oracle11gdata-warehouseinformatica-powercenterdimensional-modeling

Logic behind loading dimension tables


How do I populate Dim_tbls from a relational source?

These example tables are given:

tbl_sales:    id_sales, fk_id_customer, fk_id_product, country, timestamp   
tbl_customer: id_customer, name, adress, zip, city
tbl_product:  id_product, price, product

My goal is to get these attributes into a start-schema. The problem I have is the logic behind loading the dimension tables. I mean, what data would I load into the Dim_Product? All the products that are in tbl_product? But how would I know how many Sales are done with a specific product?

Analysis I would like to do are:

 How many people bought product x.
 How many sales are made from city x.
 How many sales were made between Time x and y. 

Example data:

 tbl_sales: id_sales | fk_id_customer | fk_id_product | country | timestamp 
                1    |       2        |      1        |   UK    | 19.11.2013 10:23:22
                2    |       1        |      2        |   FR    | 20.11.2013 06:04:22

 tbl_customer: id_customer | name | adress | zip | city
                      1    | Frank|Street X| 211 | London
                      2    | Steve|Street Y| 431 | Paris

 tbl_customer: id_product| Price | product
                      1  | 100,00| Hammer
                      2  |  50,00| Saw

Solution

  • Let's start with a very simple star schema model; for example, I assumed you don't need to worry about handling changes to dimensions' attributes.

    factSales

      DateKey
      CustomerKey
      ProductKey
      Counter (=1; this is a factless fact table)
    

    dimDate

      DateKey
      Date
      Year
      Quarter
      Month
      ...
    

    dimCustomer

      CustomerKey
      Name
      Address
      Zip
      City
    

    dimProduct

      ProductKey
      Name
      Price (if it changes, you need move it to factSales)
    

    How many people bought product x.

    SELECT DISTINCT CustomerKey
    FROM factSales
    WHERE ProductKey IN ( SELECT ProductKey
                          FROM dimProduct
                          WHERE Name = 'Product X' )
    

    How many sales are made from city x.

    SELECT SUM(Counter)
    FROM factSales
    WHERE CustomerKey IN ( SELECT CustomerKey
                           FROM dimCustomer
                           WHERE City = 'City X' )
    

    How many sales were made between Time x and y.

    SELECT SUM(Counter)
    FROM factSales
    WHERE DateKey IN ( SELECT DateKey
                       FROM dimDate
                       WHERE Date BETWEEN DateX AND DateY )