Search code examples
sql-serverstored-proceduresisolation-leveldatabase-deadlockslocks

Deadlocks and nolock issues in SQL Server


We have a stored procedure which inserts and updates bulk data; we are using nolock hints on the selects. Now we are facing 601 error when the load is really high. Can we use row versioning?

If yes which one can be used as we have simultaneously running multiple transactions. And one more thing we had to afford is delays due to always on, we multiple database servers in sync to handle database fail over.

We are using it as it can prevent reads being deadlocked by other operations. Our application high transaction rates, and one more thing in our concern is we are using always on with availability group of 5 servers so we stand a delay in commit as it only commit after committing in all servers. So should we go with row versioning? if so how to choose which one? or can we go for snapshot isolation? what costs to be afford apart from tempdb memory usage, is go for it?

with (UPDLOCK, ROWLOCK) on worked for small batches but when batch is as high as 2000 it got deadlocked.

this is the piece of code that executes the SP; the same code is being hosted on 10 servers and so this is expected to run simultaneously. GetRSExecutionLogLatestID is where SP is being called internally.

ExecutionLogData execLog = this.fileShareDeliveryWrapper.GetRSExecutionLogLatestID(rptPath, notification.Owner, sharePointSubscriptiondata.RASSubscriptionID, this.configRSConnectionString);
                        if (execLog != null)
                        {
                            sharePointSubscriptiondata.RSExecutionLogId = execLog.RSExecutionLogId;
                            sharePointSubscriptiondata.RSSubscriptionId = execLog.RSSubscriptionId;
                        }

this is the SP:

    ALTER Procedure [dbo].[RAS_USP_GetLatestExecutionLogId]
@ReportURLPath NVARCHAR(MAX),
@UserID NVARCHAR(200),
@RASSubscriptionID NVARCHAR(100)
AS
BEGIN

SET NOCOUNT ON;
print @userId
DECLARE @SubscriptionId UNIQUEIDENTIFIER,@OwnerUserId UNIQUEIDENTIFIER, @LogEntryId BIGINT

DECLARE @Subscriptions AS TABLE(SubscriptionID uniqueIdentifier NOT NULL, 
                                NotificationID uniqueidentifier NULL, 
                                ReportID uniqueIdentifier NULL, 
                                ExtensionSettings XML NOT NULL
                                )
INSERT INTO @Subscriptions 
SELECT n.SubscriptionID,n.NotificationID,n.ReportID,n.ExtensionSettings 
FROM dbo.Notifications AS n WITH (UPDLOCK, ROWLOCK) INNER JOIN
        Subscriptions AS s WITH (UPDLOCK, ROWLOCK) ON n.SubscriptionID = s.SubscriptionID INNER JOIN
        Catalog AS c WITH (UPDLOCK, ROWLOCK) ON c.ItemID = n.ReportID INNER JOIN        
        Users AS um WITH (UPDLOCK, ROWLOCK) ON um.UserID = s.OwnerID
WHERE c.[Path] = @ReportURLPath --AND um.UserName=@UserID

SELECT @SubscriptionID = SubScriptionID FROM 

(SELECT SubscriptionID,
        NotificationID,
        ReportID,
        pValues.pValue.value('Name[1]', 'VARCHAR(50)') AS ParamName,
        pValues.pValue.value('Value[1]', 'VARCHAR(150)') AS ParamValue
    FROM 
       @Subscriptions CROSS APPLY
       ExtensionSettings.nodes('/ParameterValues/ParameterValue') pValues(pValue)
)  AS Result

where ParamName like '%RASSubscriptionID%' AND ParamValue = CAST(@RASSubscriptionID AS VARCHAR(100))

SELECT @OwnerUserId = UserID FROM Users WHERE UserName = @UserID

SELECT @LogEntryId = LogEntryId FROM (
SELECT top 1 LogEntryId
FROM ExecutionLogStorage a WITH (UPDLOCK, ROWLOCK) 
INNER JOIN [CATALOG] b WITH (UPDLOCK, ROWLOCK) ON a.reportid = b.itemid
INNER JOIN [Notifications] n WITH (UPDLOCK, ROWLOCK) ON n.ReportID = a.ReportID AND n.SubscriptionID = @SubscriptionId
INNER JOIN dbo.Subscriptions s WITH (UPDLOCK, ROWLOCK) ON s.SubscriptionID = n.SubscriptionID 
WHERE [Path] = @ReportURLPath AND n.SubscriptionOwnerID=@OwnerUserId 
ORDER BY TimeEnd desc) ss

SELECT @LogEntryId AS LogEntryId, @SubscriptionId AS SubscriptionID

END

Solution

  • SNAPSHOT ISOLATION is pretty much the way to go to prevent blocking while updating, when you have control over the queries and you need consistency between statements. READ_COMMITTED_SNAPSHOT is another possibility but affects all queries, plus it doesn't guarantee "consistency" within a given transaction (just as READ COMMITTED vs SERIALIZABLE can give different results). NOLOCK is pretty much never the way to go, 99% of the time.

    You can think of SNAPSHOT ISOLATION as a viable "read-only" serializable*. There are of course several costs -- note that you will pay many of them even if no snapshot transaction is active, because of course if one becomes active, the tracking data will have needed to be in place.

    • Increased activity on tempdb, as you mentioned
    • Rows will take up more space (14 bytes per row)
    • Somewhat more activity on most operations, so depending on the workload there may be a performance hit

    Other considerations:

    • It's best not to update data from a snapshot transaction as you may get unexpected results
    • Certain operations still cannot be done (rebuilding indexes, for example -- reorganizing is okay, but at least in SQL 2012, if a query is reading from an index in snapshot isolation while it is being rebuilt somewhere else, the read query will fail)
    • For whatever asinine reason, if you use connection pooling, the isolation level doesn't get reset when a connection is reused, which can lead to unexpected results (this applies to all transaction levels though, not just SNAPSHOT)

    This article has a good overview of the situation: https://technet.microsoft.com/en-us/library/ms188277(v=sql.105).aspx

    But ultimately, your only real choices if you need to avoid blocking are row versioning and NOLOCK, which isn't a choice. Otherwise you may need to design things differently. Other possibilities would be to use things like UPDLOCK to avoid deadlocks, to insert/update small batches of data at a time so that the "blocking periods" are short/unnoticeable, etc.


    (*) I say this because:

    1. If you open a SNAPSHOT ISOLATION transaction and only read from it, you will get the same read results as you would if you were to use SERIALIZABLE, except that you won't block writers and you won't cause deadlocks
    2. I say "viable" because most of the time, introducing SERIALIZABLE causes too much blocking and too many deadlocks.
    3. I say "read-only" because although you can write within a snapshot transaction, it is a much more complicated endeavour. If you only read from within a snapshot transaction, it's unlikely you'll run into data correctness issues -- indeed, since it's similar to SERIALIZABLE, you may even improve the consistency of your results. The only issues you're likely to run into are performance related. So overall, the stakes are fairly low in terms of it being a breaking change to introduce this to such a query. If you start writing, though:

      • If you try to save changes, and the same data was changed in another transaction, the update will fail with error 3960:

    Snapshot isolation transaction aborted due to update conflict. You cannot use snapshot isolation to access table 'XYZ' directly or indirectly in database 'ABC' to update, delete, or insert the row that has been modified or deleted by another transaction. Retry the transaction or change the isolation level for the update/delete statement.

    which is a new situation you'll have to handle, which is more difficult than just blocking (though I suppose it's of similar difficulty to dealing with deadlocks).

    Imagine that we have a bag containing a mixture of white and black marbles. Suppose that we want to run two transactions. One transaction turns each of the white marbles into black marbles. The second transaction turns each of the black marbles into white marbles. If we run these transactions under serializable isolation, we must run them one at a time. The first transaction will leave a bag with marbles of only one color. After that, the second transaction will change all of these marbles to the other color. There are only two possible outcomes: a bag with only white marbles or a bag with only black marbles.

    If we run these transactions under snapshot isolation, there is a third outcome that is not possible under serializable isolation. Each transaction can simultaneously take a snapshot of the bag of marbles as it exists before we make any changes. Now one transaction finds the white marbles and turns them into black marbles. At the same time, the other transactions finds the black marbles – but only those marbles that where black when we took the snapshot – not those marbles that the first transaction changed to black – and turns them into white marbles. In the end, we still have a mixed bag of marbles with some white and some black. In fact, we have precisely switched each marble.

    In other words, if you stick to only reading from SNAPSHOT transactions, it's much less complicated, so in that sense, I say you can think of it as a viable "read-only" serializable -- if you just stop there, it is a useful tool pretty much anyone can use to get consistent, non-blocking results, without using NOLOCK, as long as they can afford the performance hit. Of course, it is more than that, but to go further, you need to deal with a lot more complexity.