Search code examples
t-sqldatabase-designforeign-keyssql-server-2000validation

How to manage foreign key errors from insert for the purpose of data validation (t-sql)


I am building a database in SQL Server 2000 and need to perform data validation by testing for foreign key violations. This post is related to an earlier post I made (Trigger exits on first failed insert and cant set xact_abort OFF in SQL Server 2000) which focussed on how to port from a working SQL Server 2005 implementation to a server 2000 implementation. Following the advice received on this post indicating wholesale recoding was required, i am now re-considering the design itself - hence this post. To recap on my application, my

  • I receive a daily data feed containing ~5k records into a Staging table. When this insert is done a single record is then added to a table called TRIGGER_DATA.
  • I have created a trigger ‘on insert’ on this table which then attempts to insert the data therein into a FACT_data table one record at a time.
  • The FACT_data table is foreign keyed to many DIM tables which define the acceptable inputs the field can take.
  • If any record violates a foreign key constraint the insert should fail and the record should instead be inserted into a Load_error table (which has no foreign key and all fields are Nullable).

Given the volume of records in each insert i thought it would be a bad idea to create the trigger on the Stage_data table since this would result in ~5k trigger firing in one go each day. However since i cannot set xact_abort off in a trigger under SQL Server 2000 and therefore on the first failure it aborts in the trigger i am wondering if it might be actually be a half decent solution.

Questions:

  1. The basic question i am now asking myself is what is the typical approach for doing this - it seems to me that this kind of data validation through checking for FK violations must be common and therefore a consensus best practise may have emerged (although i really cant find any for server 2000 platform!)
  2. Am i correct that the trigger on the stage_data table would be bad practise given the volume of records in each insert or is it acceptable?
  3. Is my approach of looping through each record from within the trigger and testing the insert ok?
  4. What are your thoughts on this alternative that i have just thought of. Stop using triggers altogether and, after the Stage table is loaded, update a 'stack' table with a record saying that data had been received and was ready to be validated and loaded to the FACT table (perhaps along with a priority level indicating order in which order tasks must be processed). This stack or 'job' table would then be a register of all requested inserts along with their status (created/in-progress/completed). I would then have a stored procedure continually poll this table and process the top priority record. This would mean that all stored proc calls would happen outwith the trigger.

Many thanks


Solution

  • You don't need a trigger at all. Unless there is some reason that you need split-second timing of this daily data load, just schedule a job (stored proc) that runs as often as necessary to look for data in the staging table.

    When it finds any, process the records one at a time and load the ones that are OK and do whatever you do with the ones that have broken FKs (delete, move to a work queue, etc.).

    If you use a schedule frequency that is often enough that there is some risk of the next job starting while the last one is still running, then you should create a sentinel table that your stored proc can write in to say that the job is running. This could work one of two ways. Either you just have one record that says "running" or "not running" or, you could have one record per job (like a transaction log) that has a status code indicating whether the job is complete or not.