Search code examples
etldata-warehousedimensional-modelingstar-schemastar-schema-datawarehouse

Should I use multiple fact tables for each grain or just aggregate from lowest grain?


Fairly new to data warehouse design and star schemas. We have designed a fact table which is storing various measures about Memberships, our grain is daily and some of the measures in this table are things like qty sold new, qty sold renewing, qty active, qty cancelled.

My question is this, the business will want to see the measures at other grains such as monthly, quarterly, yearly etc.. so would typically the approach here just be to aggregate the day level data for whatever time period was needed or would you recommend creating separate fact tables for the "key" time periods for our business requirements e.g. monthly, quarterly, yearly? I have read some mixed information on this which is mainly why I'm seeking others views.

Some information I read had people embedding a hierarchy in the fact table to designate different grains which was then identified via a "level" type column, which was advised against by quite a few people and didn't seem good to me also, those advising against we're suggesting separate fact tables per grain but to be honest I don't see why we wouldn't just aggregate from the daily entries we have, what benefits would we get from a fact table for each grain other than some slight performance improvements maybe?


Solution

  • Each DataMart will have its own "perspective", which may require an aggregated fact grain.

    Star schema modeling is a "top-down" process, where you start from a set of questions or use cases and build a schema that makes those questions easy to answer. Not a "bottom-up" process where you start with the source data and figure out the schema design from there.

    You may end up with multiple data marts that share the same granular fact table, but which need to aggregate it in different ways, either for performance, or to have a gran to calculate and store a measure that only makes sense at the aggregated grain.

    Eg

    SalesFact (store,day,customer,product,quantiy,price,cost)
    

    and

    StoreSalesFact(store, week, revenue, payroll_expense, last_year_revenue)