Search code examples
sqlsql-serverdata-warehouse

EndDate on Dimension Table - Should we go with NULL or 99991231 Date Value


I am building a Data Warehouse on SQL Server and I was wondering what is the best approach in handling the current record in a dimension table (SCD type 2) with respect to the 'end_date' attribute. For the current record, we have the option of using a date literal such as '12/31/9999' or specify it as NULL. The dimension tables also have an additional 'current_flag' attribute in addition to 'start_date' and 'end_date'.

It is probably a minor design decision but just wanted to see if there are any advantages of using one over the other in terms of query performance or in any other way?


Solution

  • I have seen systems written both ways. Personally, I go for the infinite end date (but not NULL and the reason is simple: it is easier to validate that the type-2 records are properly tiled, with no gaps or overlaps. I prefer only one validation to two -- the other being the validation of the is_current flag. There is also only one correct way of accessing the data.

    That said, a system that I'm currently working on also publishes a view with only the current records. That is handy.

    That system is not in SQL Server. One optimization that you can attempt is clustering so the current records are all colocated -- assuming they are much more commonly accessed. You can do this using either method. Using a clustered index like this makes updates more expensive, but they can be handy for optimizing memory.