Search code examples
data-modelingdata-warehouse

Table Design for Data with multiple Granularity


I have been trying to understand data modelling and warehousing, Given the best practices of having only one type of granularity in the table can I have one table store both the low granular data with the aggregate data.

2 Table Structure

Table 1

TransactionID   Transaction_Dt  ProductID   Items   Cost_Total
11111           1/1/2020        1           10      100
11111           1/1/2020        2           5       200
11111           1/1/2020        3           4       400
11111           1/1/2020        4           5       100
11111           1/1/2020        5           6       600
11111           1/1/2020        6           10      100

Table 2 (Aggregated)

TransactionID   Transaction_Dt  Total_Items Cost_Total
11111           1/1/2020        40          1500

One table Structure

Aggregate Data in the table

TransactionID   Transaction_Dt  ProductID   Items   Cost_Total  Type
11111           1/1/2020        1           10      100         ind_Item
11111           1/1/2020        2           5       200         ind_Item
11111           1/1/2020        3           4       400         ind_Item
11111           1/1/2020        4           5       100         ind_Item
11111           1/1/2020        5           6       600         ind_Item
11111           1/1/2020        6           10      100         ind_Item
**11111         1/1/2020        ALL         40      1500        all_Item**

Here we have one record for the entire transaction with the sum of all items and sum of all cost.

Can anyone help me on the Cons of the 2nd approach where we have aggregated data in the same table


Solution

  • Some thoughts about this:

    1. I am not a fan of storing data at multiple levels of aggregation in one table for the reason that @Marmite Bomber suggested - if you do a select sum and don't filter out the aggregates, you'll get a multiple of the answer you're looking for.
    2. If you do still want to put everything into one table, I'd add another column, perhaps called agg_level, and indicate the aggregation level of the row in that table. (you are already kind of doing this with your 'Type' column although Type is a very ambiguous term).
    3. I'd recommend against changing the TransactionID value (you propose adding some asterisks to indicate that it's an aggregate). Modifying it will make it harder to search for the one you want and users will have to understand your notation to get the right records. If you do add an agg_level column and leave the TransactionIDs in their original form, you could put an easily recognizable term in the agg_level column. For example, a record could say "raw", or "Transaction Total", or "Monthly Aggregate"...
    4. If you have to put your aggregates into your base data table, like you've shown, you should consider creating views on top of the table, each view providing only detail at one level of aggregation. You likely would give users access only to these views, and not to the base data. In this way, you store everyone in one table but, to users, it looks like you have multiple tables and you needn't worry about a user accidentally misforming a query that brings back duplicate totals.

    It's a good question, Snehasish, and it shows that you've been giving it some thought. Best of luck as you navigate the need going forward!