Search code examples
sqldatabaseoracle-databasedatabase-designaudit

General database design - can I reference field names in other tables


I have a table with various fields that can be updated. Let's called it table1 and assume it has these fields:

user VARCHAR2
first_name VARCHAR2
last_name VARCHAR2

My question: I want to keep track of edits to these fields. I have an application that allows these fields to be updated but I want to keep track of when a field is edited, who edited the field and which field was edited - the which is the bit I need help with.

I could have a history table:

date_edited DATE
who_edited VARCHAR2
field_name_edited VARCHAR2

what if the field name got changed though?.. It would mean that field_name_edited would refer to a non-existent field. This seems like a silly approach.

There must be some common way of doing this sort of thing?

Many thanks.

EDIT

I am using an Oracle DB - see new question tags.


Solution

  • "There must be some common way of doing this sort of thing?"

    Not really, because your approach is a common but misguided one. Tracking changes at the column level is unsatisfactory for several reasons:

    1. It gets expensive when an update touches several columns (inserting multiple audit records for each updated row).
    2. It's hard to assemble a coherent picture of the state of the record at a given point in time (programmatically complex, and more expensive to boot).
    3. There's no elegant way to test whether a given column has been changed.
    4. Storing the column value is messy (implementations usually handle different data-types by converting them to strings).

    (Your suggested table doesn't deal with that last point, which I presume is an oversight: there's little value to an audit trail which doesn't track the changed values).

    So, what is the common solution? Row-level history tables. With triggers which insert the whole record into the journalling table (together with those metadata columns like DATE_EDITED and WHO_EDITED). These triggers can easily be generated from the data dictionary.

    It is true that this approach makes it harder to spot which columns were edited in any given transaction. However:

    1. it is hard but not impossible to do this. It can be implemented in SQL (using analytic LAG or something) but actually the human being is pretty good at spotting changes unaided.
    2. the cost of this is borne by the audit sub-system, which in real life - and regardless of what the requirements say - will not be used much, and certainly much less than the core part of the application. So it is better to offload processing costs on to the audit sub-system, rather than making the journalling expensive.

    And, as I said, row-level journalling incurs lower costs than column-level both in generating the audit records and retrieving them.


    In your comment you link to a piece about Oracle's Total Recall product. This is actually a very elegant solution, which has a very low impact on the main system, and which makes it easy to recover the historical state of our tables. The problem is, prior to 12c it is a chargeable extra to the Enterprise Edition which makes it expensive. (In fact it is now part of the Advanced Compression option rather than a separate product in its own right). In 12c Basic Flashback Data Archive (the new name for Total Recall) is available in all editions, but we still need to buy the Advanced Compression Option to compress the journalling tables.


    "You recommend [to store] row edits and thus, store redundand data. "

    Yep. Storage is usually cheap these days. The cost of storing redundant data is usually a good price to pay for efficient writing and retrieval of records.

    " You still need to copy the literal names of the original columns"

    Journalling uses one audit table for each data table; the audit table matches the structure of the live table, with some additional columns to hold metadata relating to the transaction. We would expect the audit columns to have the same names as their analogous data columns. (Doing anything else would be stupid, not least because we can generate DDL for the audit tables from the data dictionary.)