Search code examples
data-warehousesnapshotdimensional-modeling

Periodic snapshot fact table with large dimensions


I have been asked to model a star diagram.
I have 3 dimensions:

  1. Date (day,month, year, week, quarter, ...)
  2. place (500 distinct values)
  3. Product (80k different products)

The main question is how many items (products) are stored at the end of a day in every place.

After some study-time with regards to dimensional modeling. I think I should implement a Periodic snapshot table. However reading trough the Kimball Docs, I noticed that a periodic snapshot demands an entry for every combination of the dimensions. This means I should add 40M rows every day (80k*500).
Knowing that the products are (real) slow movers and that many places store zero products during long periods, this sounds like an extreme overkill.
FYI the transactions in the source DB are 150k rows after three years.

So should I really add 40M rows every day, or could I just add the non-empty stores with their products specified? Also if for whatever reason one day all stores are empty, should I make an entry for that day (with dimensions N/A for store and product)?


Solution

  • You modeled correctly. It depends from the specifications, but normally you store only the products that are present in a location (you do not store zeroes), which could yield a number substantially lower than the maximum 80k.

    If you want to further reduce your numbers, you could store the last N days and then start to move data in a "cold" table. You store (say) last 10 day snapshot, then only monthly snapshots in the main "hot" Fact Table.

    Do not exclude the possibility to calculate the snapshot on the fly in report system, depending on your environment it could be easy (in MDX or DAX for example it is). Mixed solutions are also possible (i.e only the last month calculated on the fly).