Search code examples
sqldatabaseconcurrencylockingjob-queue

The best way to use a DB table as a job queue (a.k.a batch queue or message queue)


I have a databases table with ~50K rows in it, each row represents a job that need to be done. I have a program that extracts a job from the DB, does the job and puts the result back in the db. (this system is running right now)

Now I want to allow more than one processing task to do jobs but be sure that no task is done twice (as a performance concern not that this will cause other problems). Because the access is by way of a stored procedure, my current though is to replace said stored procedure with something that looks something like this

update tbl 
set owner = connection_id() 
where available and owner is null limit 1;

select stuff 
from tbl 
where owner = connection_id();

BTW; worker's tasks might drop there connection between getting a job and submitting the results. Also, I don't expect the DB to even come close to being the bottle neck unless I mess that part up (~5 jobs per minute)

Are there any issues with this? Is there a better way to do this?

Note: the "Database as an IPC anti-pattern" is only slightly apropos here because

  1. I'm not doing IPC (there is no process generating the rows, they all already exist right now) and
  2. the primary gripe described for that anti-pattern is that it results in unneeded load on the DB as processes wait for messages (in my case, if there are no messages, everything can shutdown as everything is done)

Solution

  • Here's what I've used successfully in the past:

    MsgQueue table schema

    MsgId identity -- NOT NULL
    MsgTypeCode varchar(20) -- NOT NULL  
    SourceCode varchar(20)  -- process inserting the message -- NULLable  
    State char(1) -- 'N'ew if queued, 'A'(ctive) if processing, 'C'ompleted, default 'N' -- NOT NULL 
    CreateTime datetime -- default GETDATE() -- NOT NULL  
    Msg varchar(255) -- NULLable  
    

    Your message types are what you'd expect - messages that conform to a contract between the process(es) inserting and the process(es) reading, structured with XML or your other choice of representation (JSON would be handy in some cases, for instance).

    Then 0-to-n processes can be inserting, and 0-to-n processes can be reading and processing the messages, Each reading process typically handles a single message type. Multiple instances of a process type can be running for load-balancing.

    The reader pulls one message and changes the state to "A"ctive while it works on it. When it's done it changes the state to "C"omplete. It can delete the message or not depending on whether you want to keep the audit trail. Messages of State = 'N' are pulled in MsgType/Timestamp order, so there's an index on MsgType + State + CreateTime.

    Variations:
    State for "E"rror.
    Column for Reader process code.
    Timestamps for state transitions.

    This has provided a nice, scalable, visible, simple mechanism for doing a number of things like you are describing. If you have a basic understanding of databases, it's pretty foolproof and extensible.


    Code from comments:

    CREATE PROCEDURE GetMessage @MsgType VARCHAR(8) ) 
    AS 
    DECLARE @MsgId INT 
    
    BEGIN TRAN 
    
    SELECT TOP 1 @MsgId = MsgId 
    FROM MsgQueue 
    WHERE MessageType = @pMessageType AND State = 'N' 
    ORDER BY CreateTime
    
    
    IF @MsgId IS NOT NULL 
    BEGIN 
    
    UPDATE MsgQueue 
    SET State = 'A' 
    WHERE MsgId = @MsgId 
    
    SELECT MsgId, Msg 
    FROM MsgQueue 
    WHERE MsgId = @MsgId  
    END 
    ELSE 
    BEGIN 
    SELECT MsgId = NULL, Msg = NULL 
    END 
    
    COMMIT TRAN