In particular I am dealing with a Type 2 Slowly Changing Dimension and need to represent the time interval a particular record was active for, i.e. for each record I have a StartDate and an EndDate. My question is around whether to use a closed ([StartDate,EndDate]) or half open ([StartDate,EndDate)) interval to represent this, i.e. whether to include the last date in the interval or not. To take a concrete example, say record 1 was active from day 1 to day 5 and from day 6 onwards record 2 became active. Do I make the EndDate for record 1 equal to 5 or 6?
Recently I have come around to the way of thinking that says half open intervals are best based on, inter alia, Dijkstra:Why numbering should start at zero as well as the conventions for array slicing and the range() function in Python. Applying this in the data warehousing context I would see the advantages of a half open interval convention as the following:
Therefore my preference would be to use a half open interval methodology. However if there was some widely adopted industry convention of using the closed interval method then I might be swayed to rather go with that, particularly if it is based on practical experience of implementing such systems rather than my abstract theorising.
I have seen both closed and half-open versions in use. I prefer half-open for the reasons you have stated.
In my opinion the half-open version it makes the intended behaviour clearer and is "safer". The predicate ( a <= x < b ) clearly shows that b is intended to be outside the interval. In contrast, if you use closed intervals and specify (x BETWEEN a AND b) in SQL then if someone unwisely uses the enddate of one row as the start of the next, you get the wrong answer.
Make the latest end date default to the largest date your DBMS supports rather than null.