Search code examples
ssasetlbusiness-intelligencecubepowerpivot

Efficiently store and retrieve price points over time ranges


We are loading data from a source that contains rates with date ranges. The rate seldom changes. E.g.

ProductId, SupplierId, CategoryId, Rate, ValidFrom, ValidTo
1, 2, 3, 4.0, 2014-01-01, 2014-06-30

This rate is valid for six months. The reporting requirement is to display average rates for flexible time ranges over products, suppliers, categories...

Therefore I joined the rates with the time dimension, resulting in a fact table like this:

ProductId, SupplierId, CategoryId, Rate, Date
1, 2, 3, 4.0, 2014-01-01
1, 2, 3, 4.0, 2014-01-02
... (approx. 180 rows for this one rate)

so there will be one rate per day. Since the rates are sometimes valid for a year or two, I am producing a large fact table with about 200 million rows and growing.

The cube loads in a couple of minutes and aggregates the data down to about 1.5 GB (while the datamart occupies about 20 GB). The cube query performance seems OK.

I could of course change the granularity, for example aggregate to a month. But the rates are daily rates, they do not change often but not necessarily on the first day of a month.

I just wanted to get opinions on the row proliferation the ETL process produces. Does it have a smell, are there better ways to get done what we need?


Solution

  • I'm not sure where the tipping point would be, but if the rates didn't change very often, I'd be inclined to just query the relational tables rather than try to put them in their own fact table.

    Meaning, if I just wanted to get the rates for a certain Product/Supplier/Category set over a range of dates, and I knew that the table was in the 1000-10000's, and not in the 100,000-1,000,000's, I would just run a straight sql query for the results, rather than getting it from a cube at all.

    And by the table, I mean the table with the start/end dates for each rate, not the one where you have a row for every rate, every day.

    If the relational table is indexed for my query, it should return plenty quickly.

    Where rates relate to other facts, Sales for instance, I would make the rate a dimension rather than a fact. That would be to answer queries like "what rate was in force when this sale took place?"