We are currently working on Data mart design. We are having many Foreign keys to dimension tables. We are thinking whether to allow NULL
in Foreign key dimension fields or have -1 to represent NULL
values.
Kimball suggests to keep default row for NULL
values. http://www.kimballgroup.com/data-warehouse-business-intelligence-resources/kimball-techniques/dimensional-modeling-techniques/fact-table-null/
My lead suggests to keep NULL
as NULL
.
Will there be any performance impact for keeping NULL
in Foreign key fields ?
Kimball is right (as he usually is). Use a default value where you would use NULL
.
Why? It ensures that joins to the dimensions will not "accidentally" filter rows. Trying to reconcile results from different queries eats up a lot of time. Ensuring that joins succeed is one method of reducing such discrepancies.
If you are not going to follow his advice, then store using NULL
. A value such as -1
is particularly bad -- because it prevents the database from enforcing foreign key constraints.