Search code examples
c#multithreading.net-corethread-safetyvolatile

Thread-safety while reading bool property without locks


I've been trying to track down an extremely weird issue that occurs very rarely and takes a long time to manifest. This code pattern seemed to stand out, and I wanted to ensure this is thread-safe. A simplified form of the pattern here shows a TestClassManager class that manages leasing for TestClass objects. A TestClass object will get leased, used, and released. Once a TestClass is released, it will not be modified/used by any other thread further.

class Program
{
    public static void Main(string[] args)
    {
        var tasks = new List<Task>();
        var testClassManager = new TestClassManager();

        tasks.Add(Task.Factory.StartNew(() => TestersOperationLoop(testClassManager), TaskCreationOptions.LongRunning));
        tasks.Add(Task.Factory.StartNew(() => ClearTestersLoop(testClassManager), TaskCreationOptions.LongRunning));

        Task.WaitAll(tasks.ToArray());
    }

    public class TestClassManager
    {
        private readonly object _testerCollectionLock = new object();

        private readonly Dictionary<long, TestClass> _leasedTesters = new Dictionary<long, TestClass>();
        private readonly Dictionary<long, TestClass> _releasedTesters = new Dictionary<long, TestClass>();

        public TestClass LeaseTester()
        {
            lock (_testerCollectionLock)
            {
                var tester = new TestClass();

                _leasedTesters.Add(tester.Id, tester);
                _releasedTesters.Remove(tester.Id);

                return tester;
            }
        }

        public void ReleaseTester(long id)
        {
            lock (_testerCollectionLock)
            {
                var tester = _leasedTesters[id];

                _leasedTesters.Remove(tester.Id);
                _releasedTesters.Add(tester.Id, tester);
            }
        }

        public void Clear()
        {
            lock (_testerCollectionLock)
            {
                foreach (var tester in _releasedTesters)
                {
                    if (!tester.Value.IsChanged)
                    {
                        // I have not seen this exception throw ever, but can this happen?
                        throw new InvalidOperationException("Is this even possible!?!");
                    }
                }

                var clearCount = _releasedTesters.Count;

                _releasedTesters.Clear();
            }
        }
    }

    public class TestClass
    {
        private static long _count;

        private long _id;
        private bool _status;

        private readonly object _lockObject = new object();

        public TestClass()
        {
            Id = Interlocked.Increment(ref _count);
        }

        // reading status without the lock
        public bool IsChanged
        {
            get
            {
                return _status;
            }
        }

        public long Id { get => _id; set => _id = value; }

        public void SetStatusToTrue()
        {
            lock (_lockObject)
            {
                _status = true;
            }
        }
    }

    public static void TestersOperationLoop(TestClassManager testClassManager)
    {
        while (true)
        {
            var tester = testClassManager.LeaseTester();

            tester.SetStatusToTrue();
            testClassManager.ReleaseTester(tester.Id);
        }
    }

    public static void ClearTestersLoop(TestClassManager testClassManager)
    {
        while (true)
        {
            testClassManager.Clear();
        }
    }
}

Is the check for TestClass.IsChanged property from within the TestClassManager.Clear method thread-safe? I don't ever see the InvalidOperationException, but is that possible? If it is, that would explain my issue.

Irrespective of that, I am going to lock the read anyway to follow commonly suggested pattern of locked read if locked write. But I wanted to get some closure on understanding if this would actually cause an issue, since this would explain that extremely weird rare bug!! that keeps me awake at night.

Thanks!


Solution

  • TLDR: Your code is thread-safe. You are correct in worrying that reads of the _status field through the IsChanged property may result in stale values; however, this won't happen in your Clear method, since the other existing lock statements already protect against this.

    There are two related concepts that often get confused in multithreaded programming: mutual exclusion and memory consistency. Mutual exclusion involves the delineation of critical sections of your code, such that only one thread may execute them at any time. This is what lock is primarily meant to achieve.

    Memory consistency is a more esoteric topic, and deals with the order in which writes to memory locations by one thread may appear to reads of the same memory locations by other threads. Memory operations may appear to be reordered across threads for performance reasons, such as better utilization of local caches. Without synchronization, threads may read stale values of memory locations that have been updated by other threads – the write appears to have been performed after the read from the reader thread's perspective. To avoid such reordering, the programmer may introduce memory barriers into their code, which would prevent memory operations from being moved across. In practice, such memory barriers are usually only generated implicitly as the result of other threading operations. For example, the lock statement generates a memory barrier both on entry and on exit.

    However, the generation of memory barriers by the lock statement is only a side-effect, since the primary purpose of lock is to enforce mutual exclusion. This can give rise to confusion when programmers encounter other situations where mutual exclusion is also achieved, but memory barriers are not implicitly generated. One such situation is the reading and writing of fields whose width is up to 32 or 64 bits, which are guaranteed to be atomic on architectures of that width. Reading or writing a boolean is inherently thread-safe – you never need a lock to enforce mutual exclusion while doing so, as no other concurrent thread may "corrupt" the value. However, reading or writing a boolean without a lock means that no memory barriers get generated, so stale values may result.

    Let's apply this discussion to a stripped-down version of your code:

    // Thread A:
    _testerStatus = true;                  // tester.SetStatusToTrue();
    _testerReleased = true;                // testClassManager.ReleaseTester(tester.Id);
    
    // Thread B:
    if (_testerReleased)                               // foreach (var tester in _releasedTesters)
        Console.WriteLine($"Status: {_testerStatus}"); //     if (!tester.Value.IsChanged)
    

    Is it possible for the above program to ever output false? In this case, yes. This is the classic example discussed under Nonblocking Synchronization (recommended reading). The bug can be fixed by adding memory barriers (explicitly, as below, or implicitly, as discussed further below):

    // Thread A:
    _testerStatus = true;                  // tester.SetStatusToTrue();
    Thread.MemoryBarrier();                // Barrier 1
    _testerReleased = true;                // testClassManager.ReleaseTester(tester.Id);
    Thread.MemoryBarrier();                // Barrier 2
    
    // Thread B:
    Thread.MemoryBarrier();                            // Barrier 3
    if (_testerReleased)                               // foreach (var tester in _releasedTesters)
    {
        Thread.MemoryBarrier();                        //     Barrier 4
        Console.WriteLine($"Status: {_testerStatus}"); //     if (!tester.Value.IsChanged)
    }
    

    In your code, your ReleaseTester method has an outer lock, which implicitly generates the equivalent to Barriers 1 and 2 above. Similarly, your Clear method also has an outer lock, so it generates the equivalent to Barrier 3. Your code does not generate the equivalent to Barrier 4; however, I believe that this barrier is unnecessary in your case, since the mutual exclusion enforced by the lock entails that Barrier 2 must have been executed before Barrier 3 if the tester was released.