Search code examples
sqldatabase-schema

SQL schema pattern for keeping history of changes


Consider a database that maintains a list of persons and their contact information, including addresses and such.

Sometimes, the contact information changes. Instead of simply updating the single person record to the new values, I like to keep a history of the changes.

I like to keep the history in a way that when I look at a person's record, I can quickly determine that there are older recordings of that person's data as well. However, I also like to avoid having to build very complicated SQL queries for retrieving only the latest version of each person's records (while this may be easy with a single table, it quickly gets difficult once the table is connected to other tables).

I've come up with a few ways, which I'll add below as answers, but I wonder if there are better ways (While I'm a seasoned code writer, I'm rather new to DB design, so I lack the experience and already ran into a few dead ends).

Which DB? I am currently using sqlite but plan to move to a server based DB engine eventually, probably Postgres. However, I meant this question asked in a more general form, not specific to any particular engine, though suggestions how to solve this in certain engines are appreciated, too, in the general interest.


Solution

  • Must you keep structured history information?

    • Quite often, the history of changes does not have to be structured, because the history is needed for auditing purposes only, and there is no actual need to be able to perform queries against the historical data.

      • So, what quite often suffices is to simply log each modification that is made to the database, for which you only need a log table with a date-time field and some variable length text field into which you can format human-readable messages as to who changed what, and what the old value was, and what the new value is.
      • Nothing needs to be added to the actual data tables, and no additional complexity needs to be added to the queries.

    If you must keep structured history information:

    • If you need to able to execute queries against historical data, then you must keep the historical data in the database. Some people recommend separate historical tables; I consider this misguided. Instead, I recommend using views.

      • Rename each table from "NAME" to "NAME_HISTORY" and then create a view called "NAME" which presents to you only the latest records.
      • Views are a feature which exists in most RDBMSes. A view looks like a table, so you can query it as if it was a table, but it is read-only, and it can be created by simply defining a query on existing tables (and views.)
      • So, with a query which orders the rows by history-date, groups by all fields except history-date, selects all fields except history-date, and picks only the first row, you can create a view that looks exactly like the original table before historicity was added.
      • Any existing code which just performs queries and does not need to be aware of history will continue working as before.
      • Code that performs queries against historical data, and code that modifies tables, will now need to start using "NAME_HISTORY" instead of "NAME".
      • It is okay if code which modifies the table is burdened by having to refer to the table as "NAME_HISTORY" instead of "NAME", because that code will also have to take into account the fact that it is not just updating the table, it is appending new historical records to it.
      • As a matter of fact, since views are read-only, the use of views will prevent you from accidentally modifying a table without taking care of historicity, and that's a good thing.