I found some multithreaded code in the quite popular LazyCache library, that uses an int[]
field as a granular locking mechanism, with the intention to prevent concurrent invocation of a method with the same key
as argument. I am highly skeptical about the correctness of this code, because there is no Interlocked
or Volatile
operation used when exiting the protected region. Here is the important part of the code:
private readonly int[] keyLocks;
public virtual T GetOrAdd<T>(string key, Func<ICacheEntry, T> addItemFactory,
MemoryCacheEntryOptions policy)
{
/* Do stuff */
object cacheItem;
// acquire lock per key
uint hash = (uint)key.GetHashCode() % (uint)keyLocks.Length;
while (Interlocked.CompareExchange(ref keyLocks[hash], 1, 0) == 1) Thread.Yield();
try
{
cacheItem = CacheProvider.GetOrCreate<object>(key, CacheFactory);
}
finally
{
keyLocks[hash] = 0;
}
/* Do more stuff */
}
The protected method call is the CacheProvider.GetOrCreate<object>(key, CacheFactory)
. It is supposed to be called by one thread at a time, for the same key
. For entering the protected region there is while
loop that uses the Interlocked.CompareExchange
to change a value of the keyLocks
array from 0
to 1
. So far so good. The part that concerns me is the line that exits the protected region: keyLocks[hash] = 0;
. Since there is no barrier there, my understanding is that the C# compiler and the .NET Jitter are free to move instructions in either direction, stepping over this line. So an instruction inside the CacheProvider.GetOrCreate
method can be moved after the keyLocks[hash] = 0;
.
My question is: according to the specs, does the code above really ensure that the CacheProvider.GetOrCreate
will not be called concurrently with the same key? Is the promise of mutual exclusion fulfilled by this code? Or the code is just buggy?
Context: The relevant code was added in the library in this pull request: Optimize cache to lock per key.
Looks buggy to me; the keyLocks[hash] = 0;
is not a release store so parts of Do stuff
can reorder out of the critical section, potentially becoming visible to another thread only after it acquires the lock.
(Potentially reading already-modified data, or more likely having stores appear late and step on stores from the next thread, or not be seen by its loads.)
It will very likely compile to correct asm on x86, where all asm stores have "release" semantics so only compile-time reordering could break things, but not on ARM / AArch64 or other mainstream ISAs that are weakly ordered. So testing on x86 can't reveal this bug unless you actually do get compile-time reordering. (It's still broken, the bug is just dormant.)
https://preshing.com/20121019/this-is-why-they-call-it-a-weakly-ordered-cpu/ demos a spinlock in C++ that uses relaxed
instead of acquire
/ release
, and that it breaks in practice on ARM. That example is exactly like this, except here the CAS is like C++ memory_order_seq_cst
so the top of the critical section is strong enough. But that's not sufficient; stronger ordering for taking the lock doesn't save you from too weak an unlock.
A basic spinlock needs an acquire RMW to get exclusive ownership, and a release store to unlock, hence the names. That's sufficient to keep Do stuff
contained inside the critical section in that direction.
In C#, a release store can be done with Volatile.Write
, or via assignment to a volatile
object. My understanding is that those are equivalent to C++ foo.store(val, std::memory_order_release)
.
Related x86 asm examples and spinlock discussion:
Interlocked.Exchange
, but does need to be release
)Thread.Yield()
instead of SpinWait.SpinOnce()
, which might be good if you have more threads than cores and critical sections tend to take a long time to unlock.