We utilize SSIS (SQL/BIDS 2008) to generate close to 1000 reports per day. The same package is scheduled 5 times in SQL Agent, on staggered schedules.
The SSIS queries a Report
table, and if the report hasn't run yet on the given date (check ReportLog
) and that report is not running currently (check WorkQueue
) the first (top) report is picked up, and ran.
That query is effectively:
SELECT TOP 1 R.ReportID
FROM Report R
LEFT JOIN ReportLog L ON R.ReportID=L.ReportID AND L.RunDate >= CONVERT(DATE,GETDATE())
LEFT JOIN WorkQueue Q ON R.ReportID=Q.ReportID
WHERE R.Active=1
AND L.ReportID IS NULL
AND Q.ReportID IS NULL
The SSIS package, upon selecting this TOP 1 ReportID
(stored in a SSIS variable), inserts the ReportID
into WorkQueue
, so no other instances of the package will try to use that ReportID
. The insert happens in the next step after the select.
Most times, most days, this works fine. However, every so often two SSIS packages end up running at (within a few milliseconds) the exact same time, and both return the same TOP 1 ReportID
causing issues with one another as they both execute the same procedures, and manipulate the same underlying data.
Right now, we're considering implementing a parent SSIS package which does the select
and places the ReportID
into a "buffer" table along with the Job# of one of the 5 child packages to pick up (rather than the child packages all doing their own select top 1
), but that seems very hacky.
Another option we considered was an instead of insert
trigger on the WorkQueue
table which would raise an error on duplicate insertions.
I am wondering if there is any other way within SSIS to prevent this situation without too much redesign.
How about avoid the trigger and make use of the OUTPUT clause?
INSERT INTO
WorkQueue
OUTPUT Inserted.ReportID
SELECT TOP 1 R.ReportID
FROM Report R
LEFT JOIN ReportLog L ON R.ReportID=L.ReportID AND L.RunDate >= CONVERT(DATE,GETDATE())
LEFT JOIN WorkQueue Q ON R.ReportID=Q.ReportID
WHERE R.Active=1
AND L.ReportID IS NULL
AND Q.ReportID IS NULL;
This should allow you to pop row row into the workQueue table to claim it for the current process and simultaneously retrieve the value for SSIS to use.