Search code examples
c#multithreadingthread-safetyspinlockinterlocked

Is it safe to use the Interlocked.CompareExchange as a lock, only for enter, not for exit?


I found some multithreaded code in the quite popular LazyCache library, that uses an int[] field as a granular locking mechanism, with the intention to prevent concurrent invocation of a method with the same key as argument. I am highly skeptical about the correctness of this code, because there is no Interlocked or Volatile operation used when exiting the protected region. Here is the important part of the code:

private readonly int[] keyLocks;

public virtual T GetOrAdd<T>(string key, Func<ICacheEntry, T> addItemFactory,
     MemoryCacheEntryOptions policy)
{
    /* Do stuff */

    object cacheItem;
    // acquire lock per key
    uint hash = (uint)key.GetHashCode() % (uint)keyLocks.Length;
    while (Interlocked.CompareExchange(ref keyLocks[hash], 1, 0) == 1) Thread.Yield();
    try
    {
        cacheItem = CacheProvider.GetOrCreate<object>(key, CacheFactory);
    }
    finally
    {
        keyLocks[hash] = 0;
    }

    /* Do more stuff */
}

Link to the source code.

The protected method call is the CacheProvider.GetOrCreate<object>(key, CacheFactory). It is supposed to be called by one thread at a time, for the same key. For entering the protected region there is while loop that uses the Interlocked.CompareExchange to change a value of the keyLocks array from 0 to 1. So far so good. The part that concerns me is the line that exits the protected region: keyLocks[hash] = 0;. Since there is no barrier there, my understanding is that the C# compiler and the .NET Jitter are free to move instructions in either direction, stepping over this line. So an instruction inside the CacheProvider.GetOrCreate method can be moved after the keyLocks[hash] = 0;.

My question is: according to the specs, does the code above really ensure that the CacheProvider.GetOrCreate will not be called concurrently with the same key? Is the promise of mutual exclusion fulfilled by this code? Or the code is just buggy?

Context: The relevant code was added in the library in this pull request: Optimize cache to lock per key.


Solution

  • Looks buggy to me; the keyLocks[hash] = 0; is not a release store so parts of Do stuff can reorder out of the critical section, potentially becoming visible to another thread only after it acquires the lock.

    (Potentially reading already-modified data, or more likely having stores appear late and step on stores from the next thread, or not be seen by its loads.)

    It will very likely compile to correct asm on x86, where all asm stores have "release" semantics so only compile-time reordering could break things, but not on ARM / AArch64 or other mainstream ISAs that are weakly ordered. So testing on x86 can't reveal this bug unless you actually do get compile-time reordering. (It's still broken, the bug is just dormant.)

    https://preshing.com/20121019/this-is-why-they-call-it-a-weakly-ordered-cpu/ demos a spinlock in C++ that uses relaxed instead of acquire / release, and that it breaks in practice on ARM. That example is exactly like this, except here the CAS is like C++ memory_order_seq_cst so the top of the critical section is strong enough. But that's not sufficient; stronger ordering for taking the lock doesn't save you from too weak an unlock.


    A basic spinlock needs an acquire RMW to get exclusive ownership, and a release store to unlock, hence the names. That's sufficient to keep Do stuff contained inside the critical section in that direction.

    In C#, a release store can be done with Volatile.Write, or via assignment to a volatile object. My understanding is that those are equivalent to C++ foo.store(val, std::memory_order_release).

    Related x86 asm examples and spinlock discussion:

    • Spinlock with XCHG unlocking (the unlock does not need to be an atomic RMW, and does not need to be as strong a barrier as Interlocked.Exchange, but does need to be release)
    • Locks around memory manipulation via inline assembly - hand-written x86 asm, with some discussion of attempting to take the lock first and then spinning read-only vs. starting with a read-only access which optimized for the contended case. The naive lock here keeps spamming atomic CAS attempts even when we haven't seen any evidence that it might be available. It uses Thread.Yield() instead of SpinWait.SpinOnce(), which might be good if you have more threads than cores and critical sections tend to take a long time to unlock.