Search code examples
multithreadingasynchronoushaxethread-synchronizationcritical-section

Is Entrance into a Windows Critical Section an atomic operation?


I wrote an FFI for critical sections, and I wrote a test for it in Haxe.

Tests run in order defined (public functions are tests)

This test test_critical_section will intermittently hang and fail:

1   var criticalSection:CriticalSection;
2
3   #if master
4   public function test_init_critical_section() {
5       return assert(attempt({
6           criticalSection = synch.SynchLib.critical_section_init(SPIN_COUNT);
7           trace('criticalSection: $criticalSection');
8       }));
9   }
10  var criticalValue = 0;
11  var done = 0;
12  var numThreads = 50;
13  function work_in_critical_section(ID:Int, a:AssertionBuffer) {
14      sys.thread.Thread.create(() -> {
15          inline function threadMsg(msg:String)
16              trace('Thread ID $ID: $msg');
17          
18          
19          threadMsg("Attempting to enter critical section");
20          criticalSection.critical_section_enter();
21          threadMsg("Entering crtiical section. Doing work.");
22          Sys.sleep(Std.random(100)/500); // simulate work in section
23          criticalValue+= 10;
24          done++;
25          a.assert(criticalValue == done * 10);
26          threadMsg("Leaving critical section. Work done. done: " + done);
27          criticalSection.critical_section_leave();
28          if (done == numThreads) {
29              a.assert(criticalValue == numThreads * 10);
30              a.done();
31              
32          }
33      });
34  }
35  @:timeout(30000)
36  public function test_critical_section() {
37      var a = new AssertionBuffer();
38      for (i in 0...numThreads)
39          work_in_critical_section(i, a);
40      return a;
41  }

But when I add Sys.sleep(ID/5); just before entrance into the critical section (on the blank line 18), the test passses every single time (with any number of threads). Without it, the test fails randomly (more often with a higher number of threads).

My conclusion from this test is that entrance to a critical section is not atomic, and multiple threads simultaneously attempting to enter may leave the critical section in an undefined state (leading to undefined/hanging behavior).

Is this the right conclusion or am I simply mis-using critical sections (and thus, the test needs to be re-written)? And if it is the right conclusion.. does this not mean that entrance into the critical section needs its own atomic locking/synchronization mechanism..? (and further, if that is the case.. what is the point of critical sections, why would I not just use whatever that atomic synchronization mechanism is?)

To me, this seems problematic, for example, consider 10 threads meet at a synchronization barrier (with a capacity of 10), and then all 10 need to proceed through a critical section immediately after the 10th thread arrives, does that mean I'd have to synchronize/serialize access to the critical section entrance method (for instance, by sleeping such as to ensure only one thread attempts to enter the section at a given tick, as done to fix the failing test above)?

The FFI is writen ontop of synchapi.h (see EnterCriticalSection)


Solution

  • You read done outside the critical section. That is a race condition. If you want to look at the value of done, you need to do it before you leave the critical section.

    You might see a write to done from another thread, triggering the assert before the write to criticalValue is visible to the thread that saw the write to done.

    If the critical section protects criticalValue and done, then it is an error to access either of them without being in the critical section unless you are sure every thread that might access them has terminated. Your code violates this rule.