I am participating in creation of reporting software which utilizes Kimball star schema methodology. Entire team (including me) hasn't worked with this technology so we are new in this.
There are couple of dimension and fact tables in or system so far. For example:
- DIM_Customer (dimension table for customers)
- DIM_BusinessUnit (dimension table for business units)
- FT_Transaction (fact table, granularity per transaction)
- FT_Customer (fact table for customer, customer id and as on date are in composite PK)
This is current structure of FT_Customer:
- customer_id # (customer id, part of composite PK)
- as_on_date # (date of observation, part of composite PK)
- waic (KPI)
- wat (KPI)
- waddl (KPI)
- wadtp (KPI)
- aging_bucket_current (KPI)
- aging_bucket_1_to_10 (KPI)
- aging_bucket_11_to_25 (KPI)
- ... ...
Fields waic, wat, waddl and wadtp are related to delay in transaction payment. These fields are calculated by aggregation query against FT_Transaction table grouped by customer_id and as_on_date.
Fields aging_bucket_current, aging_bucket_1_to_10 and aging_bucket_11_to_25 contains number of transactions categorized by delay in payment. For example aging_bucket_current contains number of transactions that are payed on time, aging_bucket_1_to_10 contains number of transactions that are payed with 1 to 10 days delay ...
This structure is used for report generation from PHP web application as well as Cognos studio. We discussed about restructuring FT_Customer table in order that make it more usable for external systems like Cognos.
New proposed structure of FT_Customer:
- customer_id # (customer id, part of composite PK)
- as_on_date # (date of observation, part of composite PK)
- kpi_id # (id of KPI, foreign key that points to DIM_KPI dimension table, part of composite PK)
- kpi_value (value KPI)
- ... ...
For this proposal we will have additional dimension table DIM_KPI:
- kpi_id #
- title
This table will contain all KPIs (wat, waic, waddl, aging buckets ...).
Second structure of FT_Customer will obviously have more rows than current structure.
Which structure of FT_Customer is more universal?
Is it acceptable to keep both structures in separate tables? This will obviously put additional burden to ETL layer because some of work will be done twice, but on the other side it will make easier generation of various reports.
Thanks in advance for suggenstions.
The 1st structure seems to be more natural and common to me. However, the 2nd one is more flexible, because it supports adding new KPIs without changing the structure of the fact table.
If different ways of accessing data actually require different structures, there is nothing wrong about having two fact tables with the same data, as long as:
You should test the results for any data inconsistencies.