Search code examples
sql-serverssisetldata-warehousescd

SCD Inferred member duplicate entry


How to best handle this scenario:

  1. Category dimension table contains "Food" and "Drink" categories
  2. Fact record arrives with unknown category and creates inferred member "N/A" in Category dimension.
  3. We have a process in place to review inferred members, turns out that it was a typo and category for this fact record should have been "Food"

Possible actions:

Option A: go back to the source system and fix the category to match one of the existing categories. Next time ETL runs - it picks up existing category and updates the fact table. Getting a total of Food category includes that fact record when update is complete.

Option B: in the dimension table update the CategoryName field to "Food" and leave inferred member. Do not correct in the source system - in the reports group by CategoryName instead of the category key.

If the correction in source system is not possible - what are my other options for handling this scenario?


Solution

  • Both, actually.

    Option A should be implemented. Because if there's a typo detected upstream, when possible, it should be fixed.

    However, Option B touches a very important point: Grouping of report data should be done on attributes, not keys. Regardless of having one or many "Food" entries, you should always group by category name, and not the category key. Think SCD Type II: if you group by key, then facts pertaining to different versions of the same dimension record will be grouped in different rows, which is not what we normally want.