Search code examples
mysqltransactionsinnodbcommitdatabase-concurrency

Transaction concurrency to prevent read of old version of a record


Let's say I have a table called tasks. Each task has a status. I take one of the tasks that are in the To Manage status, put it in the In Management status, and run the procedure that the task was created for (which may take several seconds to complete).

At the end of the execution, the task may return to To Manage or Completed status, depending on whether the procedure must be run again or not.

Now let's say there are several processes that run this activity at the same time, in order to complete or otherwise handle multiple different tasks together.

I would like to make sure that two processes do not manage the same task at the same time. To accomplish this, the activity described above should be performeded within a transaction:

$db->beginTransaction(); /* transaction A */

/* Reads one task from the database (SELECT query with LIMIT 1) which is in the `To Manage` status and returns it */
$task = $tasks->getNextTask(); /* operation 1 */

/* Changes the status into the `In Management` status (UPDATE query) */
$task->changeStatusToManage(); /* operation 2 */

$db->commit();

$task->execute(); /* operation 3 */

I'm using a MySql database and the table is InnoDB, with READ COMMITTED isolation level: https://dev.mysql.com/doc/refman/5.7/en/innodb-transaction-isolation-levels.html

We say that there is only one task in the To Manage status. If two processes (P1 and P2) are executed simultaneously and that transaction A does not exist, the following could happen:

Instant 1: (operation 1) P1 reads the task id 100 in `To Manage` status
Instant 2: (operation 1) P2 reads the task id 100 in `To Manage` status
Instant 3: (operation 2) P1 puts the task id 100 in the `In Management` status
Instant 4: (operation 2) P2 puts the task id 100 in the `In Management` status
Instant 5: (operation 3) P1 performs the task id 100
Instant 6: (operation 3) P2 performs the task id 100

However, the fact that operations 1-2-3 are actually executed within a transaction, this scenario should be impossible.

  • Can you confirm that it is actually so?
  • Do I need to perform an explicit LOCK for reading of the task table before doing the operation 1 and release it after the operation 2 has been completed?
  • Is there anything else I should do to prevent unexpected results?

The DB structure is much more complicated than the one described above. When I change a task status, I write a log on another table, too. This is done by the code (Model classes) itself. I have the task table, the task_status table with a foreign key on task and a task_status_change (which is the log table). Each txn performs 1 read (get the task), 2 writes (change status and write log). So I need to perform something similar to this (pseudocode):

BEGIN;
$id = SELECT task_id FROM task WHERE task_status_id = 1 LIMIT 1;
UPDATE task SET task_status_id = 2 WHERE task_id = $id;
INSERT INTO task_status_change SET task_id = $id, task_status_id = 2;
COMMIT;

As I mentioned above, I'm using READ COMMITED isolation level. I tried to launch two processes at the same time, running together on the same tasks pool.

Task IDs chosen by the first process (ID and timestamp):

55 1496925510
274 1496925512
384 1496925512
589 1496925513
648 1496925513
1088 1496925513
1990 1496925513

Task IDs chosen by the second process (ID and timestamp):

55 1496925510
274 1496925512
589 1496925512
648 1496925513
810 1496925513
1088 1496925513
2049 1496925514

Thank you


Solution

  • getNextTask should modify the status and get the id of the task it modified in a single transaction. One way (in pseudo code):

    BEGIN;
    $id = SELECT id ... 
            WHERE status = 'idle'
            LIMIT 1 ... FOR UPDATE;
    UPDATE ...  SET status = 'management' WHERE id = $id
    COMMIT;
    

    Depending on your table structure, it may be possible to do the transaction in a single atomic UPDATE statement. (You have not provided much detail.)

    Do something similar for each status transition.

    This provides transactional semantics to a long-running collection of your code, using nothing but one status in the database.

    It sounds like a "queuing" mechanism. I have a mantra: "Don't queue it, just do it." This means that it may be easier/faster/simpler to spawn a worker process whenever you have a task to do, instead of queuing it, etc.