I am starting to build a star schema, and I like it ^^
I have a design problem with dimensional modeling.
I have a Fact table for each transaction in the star schema (highest grain) Something like that (simplified version)
transaction_facts
- id
- account_dim
- date_dim
- status_dim
- amount
status_dim
- id
- code
- description
- final
For a transaction, the status is not clearly defined at process time. Most all of the status fall into these cases:
The last status is the problematic one, since I can receive the confirmation of the transaction few days (up to 10 and sometime, even more) after the original transaction.
How should I handle this kind of late change? Intuitively, I would be tempted to just reaffect the existing transactions to the new dimension, but it make me think of 2 things:
If
Then you can/should update the value.
If using Redshift then you can do this efficiently by writing a batch of updates to a staging table (copy from s3) then applying these all in one go as an update.