I have a Person Satellite with a Gender attribute. From source systems the values for this attribute can be: F, M, FEMALE, or MALE. Which of the two following approaches is the correct one for Data Vault modeling?
Store data in Gender as it comes from sources and in the Business Vault or Data Marts standardize the values to FEMALE and MALE only
Create a cross-reference table to map out F to FEMALE and M to MALE, while loading the Person Satellite, transform F to FEMALE and M to MALE using the cross-reference table.
I'm using Amazon Redshift that supports column compression.
I emailed Daniel Linstedt, creator of the Data Vault modeling method, to ask him the same question. His answer:
"I typically store it as it comes in, THEN translate it on the way to the Business DV. This way, if the business ever changes it's mind, we can re-write the translation rule without affecting history. But more than that, I've seen source systems that deliver values outside the boundaries of what's acceptable. Do not try to translate on the way in to the Raw DV, to do so would destroy auditability."