Search code examples
c#multithreadinglockingreaderwriterlockslim

"Reimplementing" (cloning) ReaderWriterLockSlim


We have several full lock-s in our web application that could be more performant if they were replaced with read-writer locks - even with the additional overhead of the RW lock, our code would be more parallel. Unfortunately, I cannot use ReaderWriterLockSlim because of what seems like a pretty big synchronization bug that may lead to deadlocks on certain hardware. This article explains the problem in great detail.

The above issue seems to be fixed in .NET Framework 4.0.30319.33440 but not all Windows server versions get this fix. Windows 8.1 and Windows Server 2012 R2 have the fix. Windows Server 2012 (not R2) and Windows Server 2008 R2 still have the bug, even after the latest patches. It looks like Microsoft doesn't plan on fixing this problem on all platforms.

Our various server environments use different versions of Windows Server and I've confirmed that some servers have the fix, while some don't. This means that I cannot safely change all those lock statements to reader-writer locks because on some of our servers the application may deadlock randomly.

As an alternative, I'm looking into reimplementing ReaderWriterLockSlim by decompiling the code in System.Core and creating a new class (something like MyRWLock) that has the one (known) buggy function ExitMyLock fixed. All other code is the same as the original ReaderWriterLockSlim.

This requires the removal of all __DynamicallyInvokable attributes and the addition of 2 or 3 classes / structures that are internal to System.Core. I did all this and have the new lock class compiles without errors.

My question is: can anyone think of any reason that this new class wouldn't work as the original ReaderWriterLockSlim class does? I consider myself fairly good when it comes to threading but I'm not an expert. Since I didn't change any code (other than fixing some type names to point to the new MyRWLock instead of ReaderWriterLockSlim and the attributes), I believe this should work. I do wonder, however, if I forgot about something that may break this in various interesting ways that are hard to debug.

Alternatively: is my (and the linked article's) understanding wrong? Does this problem not need fixing in the first place? The author of that article seems to have done a very detailed analysis which looks correct to me and yet Microsoft didn't apply the change to certain Windows Server versions.

Any thoughts on this would be much appreciated.

Edit:

To add more context for the full locks, here's what they are for. We have a service that reads / writes a remote service when it performs its work. There are many more reads than writes. A read involves 1 network roundtrip and a write involves 3 network roundtrips to the remote service. When a write happens, no read may happen (the write is really a read->delete->add). Right now, we use full lock-s around all of these operations but this means that all the threads that try to read still have to queue up until the current read finishes, even without a write. It seems to me that an RW lock would be ideal for this.


Solution

  • Is the article accurate? I don't know. They at least point to updates in .NET that address the issue they claim exists, so that suggests it is. On the other hand, there is still the question as to whether it applies to you. Desktop x86 machines have volatile semantics for all memory access anyway, so if the only bug describes is the lack of an explicit volatile operation, that wouldn't be a concern on a huge part of the hardware out there.

    On big server hardware, the architectures are different, and they might well have different semantics, requiring explicitly volatile operations. That'd be something you might want to research.

    Of course, regardless of hardware architecture, "volatile" also has meaning to the .NET compilers. The code quoted in the article doesn't seem to me like it would be affected by compiler optimizations, but I'm not an expert in that area so I couldn't say for sure.

    I take it from the question that you have not actually witnessed this bug occur. That it's a purely hypothetical concern at this point.

    I would be wary of trying to reimplement this type of class, even using the original source as a starting point.

    My first choice would be to just ensure I've upgraded to a recent enough .NET version that the bug described in the article you've referenced has been fixed.

    Second choice would be to use the plain ReaderWriterLock class instead of ReaderWriterLockSlim.

    Third choice would be to use the native Win32 "slim reader/writer lock" structure (i.e. via p/invoke or C++/CLI).

    Only if none of those options proved workable would I go to the trouble to implement my own reader-writer lock, and you can bet I would not attempt the "slim" variant (i.e. with all the tricky lock-free stuff...I could see using one volatile flag, but otherwise just stick with conventional Monitor-based code).