Search code examples
database-designdata-warehousestar-schema

Setting up Time Dimensions for Resolutions in a Star-Schema


I've been reading about architecture types for OLAP processing, specifically the Star Schema concept. I'm currently setting up a database for testing.

My Situation

I have 750 sensors, each will be posting data to an SQL database every minute. Currently, my approach is to post if there is a change in the value. Thinking about this a little further makes me wonder if it could pose an issue in determining if there is data loss vs an unchanged value and makes me reconsider scheduled posting.

The software that inserts the sensor's value posts a time stamp in the format 2020-01-23 13:48:52. Along with this, the sensor name and sensor value are also posted. So, 4 columns in my fact table if you count the primary key.

I know I'll need to do some intermediary processing on my data (probably a table that is processed on a schedule to get it to conform to the schema, then dropped), but for now, I'm trying to keep this high-level and tackle one issue at a time.

Here is what I kind of have envisioned currently, but I'm lost on how to handle the Time Dimension considering the resolution I need.

Star Schema Mock-up

My Questions

Currently, my approach is to post if there is a change in the value. Thinking about this a little further makes me wonder if it could pose an issue in determining if there is data loss vs an unchanged value and makes me reconsider scheduled posting.

Is this a valid concern to identify data loss or is there another way to address it?

Considering my needed resolution, what would a sample time dimension look like?


Solution

  • If I understand your situation correctly, you are recording values that represent some system status at a point of time. In other words, you are capturing the system's snapshot in time. In a star schema, you should then use a "periodic snapshot fact table". Such fact tables capture values regardless if they have changed or not, because their grain is a unit of date/time, not a transaction.

    Time dimension can be modeled in several ways. I would do the following:

    Create dimension "Date" to deal with calendar dates. Grain: one calendar day. Span: from the day you have data, until today including.

    Create dimension "Time" to deal with time of the day. Grain: one minute. Span: 24 hours (so total records in this table are 60 min * 24 hours).

    Add Date and Time keys to the fact table.

    I would also keep time stamp as a degenerate dimension in the fact table, to have access to seconds if needed.