When implementing a bitemporal database in SQL, it is usually recommended to use the following timestamps:
I have used this approach a few times before, but I have always wondered why having only 3 timestamps, leaving TransactionEnd out, isn't just as correct an implementation. Here a transaction time range spans from TransactionStart to the next TransactionStart.
Are there any strong arguments for not only using 3 timestamps, which will limit the size of the database?
As mentioned in a comment it's for simplicity, since it's somewhat harder to make certain queries without it.
Consider the following example. John
is born in some location, Location1
, on January first 1990, but is first registered to be born on the fifth.
The database table, Persons
, now looks like this:
+----------+--------------+------------+----------+------------+----------+
| Name | Location | valid_from | valid_to | trans_from | trans_to |
+----------+--------------+------------+----------+------------+----------+
| John | Location1 | 01-01-1990 |99-99-9999| 05/01/1990 |99-99-9999|
+----------+--------------+------------+----------+------------+----------+
At this point, removing the trans_to
column wouldn't cause too much trouble, but suppose the following:
After some years, say 20, John
relocates to Location2
, and inform the officials 20 days later.
This will make the Persons
table look like this
+----------+--------------+------------+----------+------------+----------+
| Name | Location | valid_from | valid_to | trans_from | trans_to |
+----------+--------------+------------+----------+------------+----------+
| John | Location1 | 01-01-1990 |99-99-9999| 05/01/1990 |20-01-2010|
| John | Location1 | 01-01-1990 |01-01-2010| 20/01/2010 |99-99-9999|
| John | Location2 | 01-01-2010 |99-99-9999| 20/01/2010 |99-99-9999|
+----------+--------------+------------+----------+------------+----------+
Suppose someone wanted to find out "Where does the system think John is living now" (transaction time), regardless of where he actually lives. This can (roughly) be queried in SQL in the following way
Select Location
From Persons
Where Name = John AND trans_from > NOW AND trans_to < NOW
Suppose the transaction end time was removed
+----------+--------------+------------+----------+------------+
| Name | Location | valid_from | valid_to | trans_from |
+----------+--------------+------------+----------+------------+
| John | Location1 | 01-01-1990 |99-99-9999| 05/01/1990 |
| John | Location1 | 01-01-1990 |01-01-2010| 20/01/2010 |
| John | Location2 | 01-01-2010 |99-99-9999| 20/01/2010 |
+----------+--------------+------------+----------+------------+
The query above is of course no longer valid, but making logic for the same query in the last table would be somewhat difficult. Since the trans_to
is missing it will have to be derived from the other rows in the table. For instance the implicit trans_to
time for the first row (since its the oldest entry) is the trans_from
from the second row, which is the newer of the two.
The transaction end time is thus either 9999-99-99
, if the row is the newest, or it's the trans_from
from the row immediately succeeding it.
This means that the data concerning a specific row, is not entirely kept in that row, and the rows form a dependency on each other, which is (of course) unwanted. Furthermore it can be quite difficult to determine which exact row is the immediate successor of a row, which can make the queries even more complex