Search code examples
data-warehousestar-schemamicrostrategyfact-tablesnowflake-schema

Star Schema Design for User Utilization Reports


Scenario: There are 3 kinds of utilization metrics that i have derive for the users. In my application, users activity are tracked using his login history, number of customer calls made by the user, number of status changes performed by user.

All these information are maintained in 3 different tables in my application db like UserLoginHistory, CallHistory, OrderStatusHistory. All the actions made by each user is stored in these 3 tables along with DateTime info.

Now i am trying to create a reporting db that will help me in generating the overall utilization of user. Basically the report should show me for each user over a period:

  1. UserName
  2. Role
  3. Number of Logins Made
  4. Number of Calls Made
  5. Number of Status updates Made

Now i am in the process of designing my fact table. How should i go about creating a Fact table for this scenario? Should i go about creating a single fact table with rows in it capturing all these details at the granular date level (in my DimDate table level) or 3 different fact tables and relate them?

The 2 options i described above arent convincing and i am looking for better design. Thanks.


Solution

  • As rule of thumb, when you have a report which uses different facts/metrics (Number of Logins Made, Number of Calls Made, Number of Status updates Made) with the same granularity (UserName, Role, Day/Hour/Minute), you put them in the same fact table, to avoid expensive joins.

    For many reasons this is not always possible, but your case seems to me a bit different.

    You have three tables with the user activity, where probably you store more detailed information about logins, calls and Status updates. What you need for your report is a table with your metrics and the values aggregated for the time granularity that you need.

    Let's say you need the report at the day level, you need a table like this:

    Day        UserID RoleID #Logins #Calls #StatusUpdate
    20150101   1      1      1       5      3
    20150101   2      1      4       15     8
    

    If tomorrow the business will require the report by hour, the you will need:

    DayHour            UserID RoleID #Logins #Calls #StatusUpdate
    20150101 10:00AM   1      1      1       2      1
    20150101 11:00AM   1      1      0       3      2
    20150101 09:00AM   2      1      2       10     4
    20150101 10:00AM   2      1      2       5      4
    

    Then the Day level table will be like an aggregated (by Day) version of the second one. The DayHour attribute is child of the Day one.

    If you need minute details you go down with the granularity.

    You can also start directly with a summary table at the minute level, but I would double check the requirement with the business, usually one hour range (or 15 minutes) are enough.

    Then if they need to get more detailed information, you can always drill down querying your original tables. The good thing is that when you drill to that level you should have just a small set of rows to query (like just few hours for a specific UserName) and your database should be able to handle it.