Search code examples
sql-serversql-server-2008ssisssis-2008

SSIS Race Condition


We utilize SSIS (SQL/BIDS 2008) to generate close to 1000 reports per day. The same package is scheduled 5 times in SQL Agent, on staggered schedules.

  • Job1 - Every 5 min from 12:00
  • Job2 - Every 5 min from 12:01
  • Job3 - Every 5 min from 12:02
  • Job4 - Every 5 min from 12:03
  • Job5 - Every 5 min from 12:04

The SSIS queries a Report table, and if the report hasn't run yet on the given date (check ReportLog) and that report is not running currently (check WorkQueue) the first (top) report is picked up, and ran.

That query is effectively:

SELECT TOP 1 R.ReportID
FROM Report R
LEFT JOIN ReportLog L ON R.ReportID=L.ReportID AND L.RunDate >= CONVERT(DATE,GETDATE())
LEFT JOIN WorkQueue Q ON R.ReportID=Q.ReportID
WHERE R.Active=1
AND L.ReportID IS NULL 
AND Q.ReportID IS NULL

The SSIS package, upon selecting this TOP 1 ReportID (stored in a SSIS variable), inserts the ReportID into WorkQueue, so no other instances of the package will try to use that ReportID. The insert happens in the next step after the select.

Most times, most days, this works fine. However, every so often two SSIS packages end up running at (within a few milliseconds) the exact same time, and both return the same TOP 1 ReportID causing issues with one another as they both execute the same procedures, and manipulate the same underlying data.

Right now, we're considering implementing a parent SSIS package which does the select and places the ReportID into a "buffer" table along with the Job# of one of the 5 child packages to pick up (rather than the child packages all doing their own select top 1), but that seems very hacky.

Another option we considered was an instead of insert trigger on the WorkQueue table which would raise an error on duplicate insertions.

I am wondering if there is any other way within SSIS to prevent this situation without too much redesign.


Solution

  • How about avoid the trigger and make use of the OUTPUT clause?

    INSERT INTO
        WorkQueue
    OUTPUT Inserted.ReportID
    SELECT TOP 1 R.ReportID
    FROM Report R
        LEFT JOIN ReportLog L ON R.ReportID=L.ReportID AND L.RunDate >= CONVERT(DATE,GETDATE())
        LEFT JOIN WorkQueue Q ON R.ReportID=Q.ReportID
    WHERE R.Active=1
        AND L.ReportID IS NULL 
        AND Q.ReportID IS NULL;
    

    This should allow you to pop row row into the workQueue table to claim it for the current process and simultaneously retrieve the value for SSIS to use.