How do I populate Dim_tbls from a relational source?
These example tables are given:
tbl_sales: id_sales, fk_id_customer, fk_id_product, country, timestamp
tbl_customer: id_customer, name, adress, zip, city
tbl_product: id_product, price, product
My goal is to get these attributes into a start-schema. The problem I have is the logic behind loading the dimension tables. I mean, what data would I load into the Dim_Product? All the products that are in tbl_product? But how would I know how many Sales are done with a specific product?
Analysis I would like to do are:
How many people bought product x.
How many sales are made from city x.
How many sales were made between Time x and y.
Example data:
tbl_sales: id_sales | fk_id_customer | fk_id_product | country | timestamp
1 | 2 | 1 | UK | 19.11.2013 10:23:22
2 | 1 | 2 | FR | 20.11.2013 06:04:22
tbl_customer: id_customer | name | adress | zip | city
1 | Frank|Street X| 211 | London
2 | Steve|Street Y| 431 | Paris
tbl_customer: id_product| Price | product
1 | 100,00| Hammer
2 | 50,00| Saw
Let's start with a very simple star schema model; for example, I assumed you don't need to worry about handling changes to dimensions' attributes.
DateKey
CustomerKey
ProductKey
Counter (=1; this is a factless fact table)
DateKey
Date
Year
Quarter
Month
...
CustomerKey
Name
Address
Zip
City
ProductKey
Name
Price (if it changes, you need move it to factSales)
How many people bought product x.
SELECT DISTINCT CustomerKey
FROM factSales
WHERE ProductKey IN ( SELECT ProductKey
FROM dimProduct
WHERE Name = 'Product X' )
How many sales are made from city x.
SELECT SUM(Counter)
FROM factSales
WHERE CustomerKey IN ( SELECT CustomerKey
FROM dimCustomer
WHERE City = 'City X' )
How many sales were made between Time x and y.
SELECT SUM(Counter)
FROM factSales
WHERE DateKey IN ( SELECT DateKey
FROM dimDate
WHERE Date BETWEEN DateX AND DateY )