What's wrong with this pseudocode for the synchronization barrier?
global (shared) count : integer := P
procedure central_barrier
if fetch_and_decrement(&count) == 1
count := P
else
repeat until count == P
What is wrong with the above code?
p= number of threads
One issue is that it depends on the precise semantics of fetch_and_decrement
. And of other aspects of the (assumed) memory and execution models. These things would need to be specified before one can confidently reason about the behavior of your pseudo-code.
A 2nd issue is that the threads will be busy-waiting.
A 3rd issue is that there appears to be a race condition. Suppose that a thread immediately calls central_barrier
after returning from a central_barrier
call. It could then execute the fetch_and_decrement(&count)
before all of the other threads have had a chance to end their repeat until count == P
loops. If it does, then some of those threads will keep on looping and the barrier will deadlock.
Here is an example sequence with P == 3 threads that illustrates the race condition.
Now if threads A and B are both able to test count between steps 3 and 4, then they will break out of the loop ... and all will be well. But we can't guarantee that. One or both may have been preempted or descheduled. (Imagine the case where there are more threads than physical cores to run them all.)