Search code examples
synchronizationbarrier

What's wrong with this pseudocode for the synchronization barrier?


What's wrong with this pseudocode for the synchronization barrier?

global (shared) count : integer := P

procedure central_barrier

  if fetch_and_decrement(&count) == 1
    
     count := P

  else

    repeat until count == P

What is wrong with the above code?

p= number of threads


Solution

  • One issue is that it depends on the precise semantics of fetch_and_decrement. And of other aspects of the (assumed) memory and execution models. These things would need to be specified before one can confidently reason about the behavior of your pseudo-code.

    A 2nd issue is that the threads will be busy-waiting.

    A 3rd issue is that there appears to be a race condition. Suppose that a thread immediately calls central_barrier after returning from a central_barrier call. It could then execute the fetch_and_decrement(&count) before all of the other threads have had a chance to end their repeat until count == P loops. If it does, then some of those threads will keep on looping and the barrier will deadlock.

    Here is an example sequence with P == 3 threads that illustrates the race condition.

    1. count starts at 3
    2. Thread A calls central_barrier, count <- 2, enters loop.
    3. Thread B calls central_barrier, count <- 1, enters loop.
    4. Thread C calls central_barrier, count <- 0, resets count <- 3, returns.
    5. Thread C immediately calls central_barrier again, count <- 2, enters loop.
    6. Thread A, B and C are now all stuck in the loop because count == 2 and no other threads can change it. They are all waiting for count == 3, and that will never happen.

    Now if threads A and B are both able to test count between steps 3 and 4, then they will break out of the loop ... and all will be well. But we can't guarantee that. One or both may have been preempted or descheduled. (Imagine the case where there are more threads than physical cores to run them all.)