Search code examples
sql-serverdata-warehouseazure-sql-databasestar-schema

Selecting data across day boundaries from Star schema with separate date and time dimensions


What is the correct way to model data in a star schema such that a BI tool (such as PowerBI) can select a date range crossing multiple days?

I've currently got fact tables that have separate date and time dimensions. My time resolution is to the second, date resolution is to the day.

It's currently very easy to do aggregation providing the data of interest is in the same day, or even multiple complete days, but it becomes a lot more complicated when you're asking for, say, a 12 hour rolling window that crosses the midnight boundary.

Yes, I can write a SQL statement to first pull out all rows for the entirety of the days in question, and then by storing the actual date time as a field in the fact table I can further filter down to the actual time range I'm interested in, but that's not trivial (or possible in some cases) to do in BI reporting tools.

However this must be a frequent scenario in data warehouses... So how should it be done?

An example would be give me the count of ordered items from the fact_orders table between 2017/Jan/02 1600 and 2017/Jan/03 0400.

Orders are stored individually in the fact_orders table.

In my actual scenario I'm using Azure SQL database, but it's more of a general design question.

Thank you.


Solution

  • My first option would be (as you mention in the question) to include a calculated column (Date + Time) in the SQL query and then filter the time part inside the BI tool.

    If that doesn't work, you can create a view in the database to achieve the same effect. The easiest is to take the full joined fact + dimensions SQL query that you'd like to use in the BI tool and add the date-time column in the view.

    Be sure to still filter on the Date field itself to allow index use! So for your sliding window, your parameters would be something like

    WHERE Date between 2017/Jan/02 AND 2017/Jan/03 AND DateTime between 2017/Jan/02 1600 and 2017/Jan/03 0400

    If that doesn't perform well enough due to data volumes, you might want to set up and maintain a separate table or materialized view (depending on your DB and ETL options) that does a Cartesian join of the time dimension with a small range of the Date dimension (only the last week or whatever period you are interested in partial day reports), then join the fact table to that.

    The DateTimeWindow table/view would be indexed on the DateTime column and have only two extra columns: DateKey and TimeKey. Inner join that to the fact table using both keys and you should get exactly the window you want when the BI tool supplies a datetime range.