I am trying really hard to understand the concept of having natural keys in date Dimension table.
I have always seen a random surrogate keys being created in the dimension tables. But I have recently read that using a natural key in date dimension something like 20150806
for Aug-06-2015
works much better and has considerable performance gain when it comes to lookup and reverse lookup from
fact tables when compared with a natural surrogate key which is int
by nature.
I can't understand how it will give any performance gain. We still would require join
between facts and dimension even if we use this fancy key for date dimension.
If anyone has any insight on this, would you please mind sharing the knowledge. I would appreciate if you could follow up with an example.
There's no performance gain -- the key is just an arbitrary identifier for the date record. In business apps, I generally use the day offset from January 1 2000, as a smallint. There are slight advantages to using a smaller integer, because the date key is present in so many records, it helps reduce the record size (therefore increasing the number of cacheable records).
The only real advantage to using a "natural" key value for a date is that the table can be use or browsed without necessarily joining to the date dimension table, and has some visible "lexical" meaning.
But it doesn't help in the context of a data warehouse, at all.