Search code examples
sqlconcurrent-processing

Getting a Chunk of Work


Recently I had to deal with a problem that I imagined would be pretty common: given a database table with a large (million+) number of rows to be processed, and various processors running in various machines / threads, how to safely allow each processor instance to get a chunk of work (say 100 items) without interfering with one another?

The reason I am getting a chunk at a time is for performance reasons - I don't want to go to the database for each item.


Solution

  • There are a few approaches - you could associate each processor a token, and have a SPROC that sets that token against the next [n] available items; perhaps something like:

    (note - needs suitable isolation-level; perhaps serializable: SET TRANSACTION ISOLATION LEVEL SERIALIZABLE)

    (edited to fix TSQL)

    UPDATE TOP (1000) WORK
    SET [Owner] = @processor, Expiry = @expiry
    OUTPUT INSERTED.Id -- etc
    WHERE [Owner] IS NULL
    

    You'd also want a timeout (@expiry) on this, so that when a processor goes down you don't lose work. You'd also need a task to clear the owner on things that are past their Expiry.