Search code examples
t-sqltransactionssql-server-2008-r2lockinghint

Why are lock hints needed on an atomic statement?


Question

What is the benefit of applying locks to the below statement?

Similarly, what issue would we see if we didn't include these hints? i.e. Do they prevent a race condition, improve performance, or maybe something else? Asking as perhaps they're included to prevent some issue I've not considered rather than the race condition I'd assumed.

NB: This is an overflow from a question asked here: SQL Threadsafe UPDATE TOP 1 for FIFO Queue

The Statement In Question

WITH nextRecordToProcess AS
(
    SELECT TOP(1) Id, StatusId
    FROM    DemoQueue
    WHERE   StatusId = 1 --Ready for processing
    ORDER BY DateSubmitted, Id 
)
UPDATE nextRecordToProcess
SET StatusId = 2 --Processing
OUTPUT Inserted.Id 

Requirement

  • The SQL is used to retrieve an unprocessed record from a queue.
  • The record to be obtained should be the first record in the queue with status Ready (StatusId = 1).
  • There may be multiple workers/sessions processing messages from this queue.
  • We want to ensure that each record in the queue is only picked up once (i.e. by a single worker), and that each worker processes messages in the order in which they appear in the queue.
  • It's OK for one worker to work faster than another (i.e. if Worker A picks up record 1 then Worker B picks up record 2 it's OK if worker B completes the processing of record 2 before Worker A has finished processing record 1). We're only concerned within the context of picking up the record.
  • There's no ongoing transaction; i.e. we just want to pick up the record from the queue; we don't need to keep it locked until we come back to progress the status from Processing to Processed.

Additional SQL for Context:

CREATE TABLE Statuses
(
    Id SMALLINT NOT NULL PRIMARY KEY CLUSTERED
    , Name NVARCHAR(32) NOT NULL UNIQUE
)
GO
INSERT Statuses (Id, Name)
VALUES (0,'Draft')
, (1,'Ready')
, (2,'Processing')
, (3,'Processed')
, (4,'Error')
GO
CREATE TABLE DemoQueue
(
    Id BIGINT NOT NULL IDENTITY(1,1) PRIMARY KEY CLUSTERED
    , StatusId SMALLINT NOT NULL FOREIGN KEY REFERENCES Statuses(Id)
    , DateSubmitted DATETIME --will be null for all records with status 'Draft'
)
GO

Suggested Statement

In the various blogs discussing queues, and in the question which caused this discussion, it's suggested that the above statement be changed to include lock hints as below:

WITH nextRecordToProcess AS
(
    SELECT TOP(1) Id, StatusId
    FROM    DemoQueue WITH (UPDLOCK, ROWLOCK, READPAST)
    WHERE   StatusId = 1 --Ready for processing
    ORDER BY DateSubmitted, Id 
)
UPDATE nextRecordToProcess
SET StatusId = 2 --Processing
OUTPUT Inserted.Id 

My Understanding

I understand that were locking required the benefits of these hints would be:

  • UPDLOCK: Because we're selecting the record to update it's status we need to ensure that any other sessions reading this record after we've read it but before we've updated it won't be able to read the record with the intent to update it (or rather, such a statement would have to wait until we've performed our update and released the lock before the other session could see our record with its new value).
  • ROWLOCK: Whilst we're locking the record, we want to ensure that our lock only impacts the row we're locking; i.e. as we don't need to lock many resources / we don't want to impact other processes / we want other sessions to be able to read the next available item in the queue even if that item's in the same page as our locked record.
  • READPAST: If another session is already reading an item from the queue, rather than waiting for that session to release it's lock, our session should pick the next available (not locked) record in the queue.

i.e. Were we running the below code I think this would make sense:

DECLARE @nextRecordToProcess BIGINT

BEGIN TRANSACTION

SELECT TOP (1) @nextRecordToProcess = Id
FROM    DemoQueue WITH (UPDLOCK, ROWLOCK, READPAST)
WHERE   StatusId = 1 --Ready for processing
ORDER BY DateSubmitted, Id 

--and then in a separate statement

UPDATE DemoQueue
SET StatusId = 2 --Processing
WHERE Id = @nextRecordToProcess

COMMIT TRANSACTION

--@nextRecordToProcess is then returned either as an out parameter or by including a `select @nextRecordToProcess Id`

However when the select and update occur in the same statement I'd have assumed that no other session could read the same record between our session's read & update; so there'd be no need for explicit lock hints.

Have I misunderstood something fundamentally with how locks work; or is the suggestion for these hints related to some other similar but different use case?


Solution

  • John is right in as these are optimizations, but in SQL world these optimizations can mean the difference between 'fast' vs. 'unbearable size-of-data slow' and/or the difference between 'works' vs. 'unusable deadlock mess'.

    The readpast hint is clear. For the other two, I feel I need to add a bit more context:

    • ROWLOCK hint is to prevent page lock granularity scans. The lock granularity (row vs. page) is decided upfront when the query starts and is based on an estimate of the number pages that the query will scan (the third granularity, table, will only be used in special cases and does not apply here). Normally dequeue operations should never have to scan so many pages so that page granularity is considered by the engine. But I've seen 'in the wild' cases when the engine decided to use page lock granularity, and this leads to blocking and deadlocks in dequeue
    • UPDLOCK is needed to prevent the upgrade lock deadlock scenario. The UPDATE statement is logically split into a search for the rows that need to be updated and then update the rows. The search needs to lock the rows it evaluates. If the row qualifies (meets the WHERE condition) then the row is updated, and update is always an exclusive lock. So the question is how do you lock the rows during the search? If you use a shared lock then two UPDATE will look at the same row (they can, since the shared lock allows them), both decide the row qualifies and both try to upgrade the lock to exclusive -> deadlock. If you use exclusive locks during the search the deadlock cannot happen, but then UPDATE will conflict on all rows evaluated with any other read, even if the row does not qualifies (not to mention that Exclusive locks cannot be released early w/o breaking two-phase-locking). This is why there is an U mode lock, one that is compatible with Shared (so that UPDATE evaluation of candidate rows does not block reads) but is incompatible with another U (so that two UPDATEs do not deadlock). There are two reasons why the typical CTE based dequeue needs this hint:

      1. because is a CTE the query processing does not understand always that the SELECT inside the CTE is the target of an UPDATE and should use U mode locks and
      2. the dequeue operation will always go after the same rows to update (the rows being 'dequeued') so deadlocks are frequent.