Search code examples
node.jssql-serverscheduled-tasksjob-schedulingnode-cron

How to stop duplicate execution of scheduler jobs in cluster


We have a node application which uses scheduled jobs, My scheduled job is working 6 times in a day. Now we are going to upgrade the node application into cluster based, I mean master and worker nodes. By doing that i see that jobs are duplicated and executed multiple times.

  1. I have to remove the duplicate execution
  2. if master fails to run the job or master app stopped, worker should execute the job

Note: we using sql server as database.


Solution

  • First

    SQL server is the worst possible piece of software to implement this, especially if it is very dynamic. The reason being that engines like MEMORY and MyISAM have only full-table locks while more suitable engines like InnoDB have a higher write penalty (to provide ACID properties) and are optimized for accessing records that are spatially and temporally close, which is not the case with the scenario you've provided.

    Solution With SQL SERVER

    However, if you insist on using SQL server. Here is the solution -

    The best way to implement a job queue in a relational database system is to use SKIP LOCK. Scheduling can be classified as a Graph problem where each transaction in SQL table can be regarded as one node and you're allowed to visit those nodes only once in a DAG fashion. To solve the problem, you need to use SKIP LOCKS with your Job Queue Schema

    QueueMsgId identity -- NOT NULL
    QueueMsgType varchar(20) -- NOT NULL  
    QueueState char(1) -- 'N' New if queued, 'A' Active if processing, 'C' Completed, default 'N' -- NOT NULL 
    CreateTime datetime -- default GETDATE() -- NOT NULL  
    QueueMsg varchar(255) -- NULLable 
    

    The QueueClient pulls one message and changes the state to Active while it works on it. When it's done it changes the state to Complete. SKIP LOCKED is a lock mechanism that acquires a lock to both read/share (FOR SHARE) or write/exclusive (FOR UPDATE).

    If we have multiple concurrent users trying to access the Queue records, then we want to ensure that no-one can change the data after we have read it, we can use a repeatable read. By using that, any reads of any tables we do will be locked for Update or Delete. Execute the following SELECT queries which lock the Queue records exclusively while also adding the SKIP LOCKED option for Concurrency:

    SELECT
        q.QueueMsgId AS id1,
        q.QueueMsgType AS msgType1,
        q.QueueState AS state1,
    FROM
        Queue q
    WHERE
        q.QueueState = 'A'
    ORDER BY
        q.QueueMsgId
    LIMIT 2
    FOR UPDATE OF q SKIP LOCKED
    

    For multiple users, execute the same query and there will be no-conflict for Queue records

    Best Solution

    DUMP SQL and shift to REDIS for high query per second throughput. Though Redis's Data Structure being concurrent is difficult to handle. Fortunately, we have an out-of-box solution for Job/Message Queue system in node.js - BULL, for cluster-based deployment check this example.