Search code examples
sqldatabasebi-temporal

Is using only 3 timestamps for a bitemporal SQL database possible?


When implementing a bitemporal database in SQL, it is usually recommended to use the following timestamps:

  • ValidStart
  • ValidEnd
  • TransactionStart
  • TransactionEnd

I have used this approach a few times before, but I have always wondered why having only 3 timestamps, leaving TransactionEnd out, isn't just as correct an implementation. Here a transaction time range spans from TransactionStart to the next TransactionStart.

Are there any strong arguments for not only using 3 timestamps, which will limit the size of the database?


Solution

  • As mentioned in a comment it's for simplicity, since it's somewhat harder to make certain queries without it.

    Consider the following example. John is born in some location, Location1, on January first 1990, but is first registered to be born on the fifth.

    The database table, Persons, now looks like this:

    +----------+--------------+------------+----------+------------+----------+
    |   Name   | Location     | valid_from | valid_to | trans_from | trans_to |
    +----------+--------------+------------+----------+------------+----------+
    | John     | Location1    | 01-01-1990 |99-99-9999| 05/01/1990 |99-99-9999|
    +----------+--------------+------------+----------+------------+----------+
    

    At this point, removing the trans_to column wouldn't cause too much trouble, but suppose the following:

    After some years, say 20, John relocates to Location2, and inform the officials 20 days later. This will make the Persons table look like this

    +----------+--------------+------------+----------+------------+----------+
    |   Name   | Location     | valid_from | valid_to | trans_from | trans_to |
    +----------+--------------+------------+----------+------------+----------+
    | John     | Location1    | 01-01-1990 |99-99-9999| 05/01/1990 |20-01-2010|
    | John     | Location1    | 01-01-1990 |01-01-2010| 20/01/2010 |99-99-9999|
    | John     | Location2    | 01-01-2010 |99-99-9999| 20/01/2010 |99-99-9999|
    +----------+--------------+------------+----------+------------+----------+
    

    Suppose someone wanted to find out "Where does the system think John is living now" (transaction time), regardless of where he actually lives. This can (roughly) be queried in SQL in the following way

    Select  Location
    From    Persons
    Where   Name = John AND trans_from > NOW AND trans_to < NOW
    

    Suppose the transaction end time was removed

    +----------+--------------+------------+----------+------------+
    |   Name   | Location     | valid_from | valid_to | trans_from |
    +----------+--------------+------------+----------+------------+
    | John     | Location1    | 01-01-1990 |99-99-9999| 05/01/1990 |
    | John     | Location1    | 01-01-1990 |01-01-2010| 20/01/2010 |
    | John     | Location2    | 01-01-2010 |99-99-9999| 20/01/2010 |
    +----------+--------------+------------+----------+------------+
    

    The query above is of course no longer valid, but making logic for the same query in the last table would be somewhat difficult. Since the trans_to is missing it will have to be derived from the other rows in the table. For instance the implicit trans_to time for the first row (since its the oldest entry) is the trans_from from the second row, which is the newer of the two.

    The transaction end time is thus either 9999-99-99, if the row is the newest, or it's the trans_from from the row immediately succeeding it.

    This means that the data concerning a specific row, is not entirely kept in that row, and the rows form a dependency on each other, which is (of course) unwanted. Furthermore it can be quite difficult to determine which exact row is the immediate successor of a row, which can make the queries even more complex