Search code examples
sql-serverperformancesql-server-2008lockingsubsonic3

SQL Server lock/hang issue


I'm using SQL Server 2008 on Windows Server 2008 R2, all sp'd up.

I'm getting occasional issues with SQL Server hanging with the CPU usage on 100% on our live server. It seems all the wait time on SQL Sever when this happens is given to SOS_SCHEDULER_YIELD.

Here is the Stored Proc that causes the hang. I've added the "WITH (NOLOCK)" in an attempt to fix what seems to be a locking issue.

ALTER PROCEDURE [dbo].[MostPopularRead]
AS
BEGIN
SET NOCOUNT ON;

SELECT 
    c.ForeignId , ct.ContentSource as ContentSource
    , sum(ch.HitCount * hw.Weight) as Popularity
    , (sum(ch.HitCount * hw.Weight) * 100) / @Total as Percent
    , @Total as TotalHits
from 
    ContentHit ch WITH (NOLOCK)
    join [Content] c WITH (NOLOCK) on ch.ContentId = c.ContentId
    join HitWeight hw WITH (NOLOCK) on ch.HitWeightId = hw.HitWeightId
    join ContentType ct WITH (NOLOCK) on c.ContentTypeId = ct.ContentTypeId
where 
    ch.CreatedDate between @Then and @Now
group by
    c.ForeignId , ct.ContentSource
order by
    sum(ch.HitCount * hw.HitWeightMultiplier) desc
END

The stored proc reads from the table "ContentHit", which is a table that tracks when content on the site is clicked (it gets hit quite frequently - anything from 4 to 20 hits a minute). So its pretty clear that this table is the source of the problem. There is a stored proc that is called to add hit tracks to the ContentHit table, its pretty trivial, it just builds up a string from the params passed in, which involves a few selects from some lookup tables, followed by the main insert:

BEGIN TRAN
insert into [ContentHit] 
    (ContentId, HitCount, HitWeightId, ContentHitComment)
values
    (@ContentId, isnull(@HitCount,1), isnull(@HitWeightId,1), @ContentHitComment)
COMMIT TRAN

The ContentHit table has a clustered index on its ID column, and I've added another index on CreatedDate since that is used in the select.

When I profile the issue, I see the Stored proc executes for exactly 30 seconds, then the SQL timeout exception occurs. If it makes a difference the web application using it is ASP.NET, and I'm using Subsonic (3) to execute these stored procs.

Can someone please advise how best I can solve this problem? I don't care about reading dirty data...

EDIT: The MostPopularRead stored proc is called very infrequently - its called on the home page of the site, but the results are cached for a day. The pattern of events that I am seeing is when I clear the cache, multiple requests come in for the home site, and they all hit the stored proc because it hasn't yet been cached. SQL Server then maxes out, and can only be resolved by restarting the sql server process. When I do this, usually the proc will execute OK (in about 200 ms) and put the data back in the cache.

EDIT 2: I've checked the execution plan, and the query looks quite sound. As I said earlier when it does run it only takes around 200ms to execute. I've added MAXDOP 1 to the select statement to force it to use only one CPU core, but I still see the issue. When I look at the wait times I see that XE_DISPATCHER_WAIT, ONDEMAND_TASK_QUEUE, BROKER_TRANSMITTER, KSOURCE_WAKEUP and BROKER_EVENTHANDLER are taking up a massive amount of wait time.

EDIT 3: I previously thought that this was related to Subsonic, our ORM, but having switched to ADO.NET, the erros is still live.


Solution

  • Remove the NOLOCK hint.

    Open a query in SSMS, run SET STATISTICSIO ON and run the query in the procedure. Let it finish and post here the IO stats messages. Then post the table definitions and all indexes defined on them. Then somebody will be able to reply with the proper indexes you need.

    As with all SQL performance problem, the text of the query is largely irrelevant without complete schema definition.

    A guesstimate covering index would be:

    create index ContentHitCreatedDate 
       on ContentHit (CreatedDate) 
       include (HitCount, ContentId,  HitWeightId);
    

    Update

    XE_DISPATCHER_WAIT, ONDEMAND_TASK_QUEUE, BROKER_TRANSMITTER, KSOURCE_WAKEUP and BROKER_EVENTHANDLER: you can safely ignore all these waits. They show up because they represent threads parked and waiting to dispatch XEvents, Service Broker or internal SQL thread pool work items. As they spend most of their time parked and waiting, they get accounted for unrealistic wait times. Ignore them.