I am reading this pseudo code for a barrier synchronization algorithm from this paper, and I could not fully understand it.
The goal of the code is to create a barrier for multiple threads (threads can't pass the barrier unless all threads have completed) using something called "software combining tree" (not sure what it means)
Here is the pseudo code (though I encourage you to look at the article as well)
type node = record
k : integer // fan-in of this node
count : integer // initialized to k
locksense : Boolean // initially false
parent : ^node // pointer to parent node; nil if root
shared nodes : array [0..P-1] of node
// each element of nodes allocated in a different memory module or cache line
processor private sense : Boolean := true
processor private mynode : ^node // my group's leaf in the combining tree
procedure combining_barrier
combining_barrier_aux (mynode) // join the barrier
sense := not sense // for next barrier
procedure combining_barrier_aux (nodepointer : ^node)
with nodepointer^ do
if fetch_and_decrement (&count) = 1 // last one to reach this node
if parent != nil
combining_barrier_aux (parent)
count := k // prepare for next barrier
locksense := not locksense // release waiting processors
repeat until locksense = sense
I understand that it implies building a binary tree but I didn't understand a few things.
YES
This is called the ary-ness of the tree being constructed. This is a binary tree and hence k = 2. The idea is that for a smaller number of processors, the binary tree will suffice. But as the number of processors increases, then the levels in the tree will grow a lot. This is balanced by increasing the ary-ness by increasing the value of k. This will essentially enable more than two processors to be part of a leaf or a group. As the system scales to thousands of processors with interconnect, this may be important. The downside to this is the increased contention. As more processors become part of the same tree, they will be spinning on the same variable.
The threads are organized into groups equal to number of leaves. The number of leaves is essentially the number of threads divided by the ary-ness or k. So for a set of 8 threads, with k=2, the number of leaves will be 4. This allows the threads in a group to spin on a single variable and also to distribute the spinning across multiple variables, rather than a single shared variable as in the basic centralized barrier algorithm.
Is there exactly one node for each thread?
The answer is NO. Of course there are at least as many nodes as the leaves. For a 8-thread problem, there will be at 4 nodes for the 4 leaves. Now, after this flat level, the "winners"(thread that comes last) will climb up to its parent in a recursive manner. The number of levels will be log P to base k
. For each parent, there will be a node, eventually climbing up to the true root or the parent.
for e.g. for the 8 thread problem, there will be 4 + 2 + 1 = 7 nodes.
How do I get "my group's leaf in the combining tree"?
This is a little tricky part. There is a formula based on modulus and some integer division. However, I have not seen a publicly available implementation. sorry, I can't reveal what I have seen only in a class, as that may not be appropriate. May be a google search or someone else can fill in this.