multithreading algorithm pseudocode barrier

Understanding "A software combining tree barrier with optimized wakeup" algorithm

I am reading this pseudo code for a barrier synchronization algorithm from this paper, and I could not fully understand it.

The goal of the code is to create a barrier for multiple threads (threads can't pass the barrier unless all threads have completed) using something called "software combining tree" (not sure what it means)

Here is the pseudo code (though I encourage you to look at the article as well)

type node = record
      k : integer             // fan-in of this node
      count : integer         // initialized to k
      locksense : Boolean     // initially false
      parent : ^node          // pointer to parent node; nil if root

  shared nodes : array [0..P-1] of node
      // each element of nodes allocated in a different memory module or cache line
  processor private sense : Boolean  := true
  processor private mynode : ^node    // my group's leaf in the combining tree

  procedure combining_barrier 
      combining_barrier_aux (mynode)      // join the barrier
      sense := not sense                  // for next barrier

  procedure combining_barrier_aux (nodepointer : ^node)
      with nodepointer^ do
          if fetch_and_decrement (&count) = 1     // last one to reach this node
              if parent != nil
                  combining_barrier_aux (parent)
              count := k                          // prepare for next barrier
              locksense := not locksense          // release waiting processors
          repeat until locksense = sense

I understand that it implies building a binary tree but I didn't understand a few things.

Is P the number of threads?
What is k? What is "fan-in of this node"
The article mentions that threads are organized as groups on the leaves of the tree, what groups?
Is there exactly one node for each thread?
How do I get "my group's leaf in the combining tree"?

Solution

Is P the number of threads?

YES

What is k? What is "fan-in of this node".

This is called the ary-ness of the tree being constructed. This is a binary tree and hence k = 2. The idea is that for a smaller number of processors, the binary tree will suffice. But as the number of processors increases, then the levels in the tree will grow a lot. This is balanced by increasing the ary-ness by increasing the value of k. This will essentially enable more than two processors to be part of a leaf or a group. As the system scales to thousands of processors with interconnect, this may be important. The downside to this is the increased contention. As more processors become part of the same tree, they will be spinning on the same variable.

The article mentions that threads are organized as groups on the leaves of the tree, what groups?

The threads are organized into groups equal to number of leaves. The number of leaves is essentially the number of threads divided by the ary-ness or k. So for a set of 8 threads, with k=2, the number of leaves will be 4. This allows the threads in a group to spin on a single variable and also to distribute the spinning across multiple variables, rather than a single shared variable as in the basic centralized barrier algorithm.

Is there exactly one node for each thread?

The answer is NO. Of course there are at least as many nodes as the leaves. For a 8-thread problem, there will be at 4 nodes for the 4 leaves. Now, after this flat level, the "winners"(thread that comes last) will climb up to its parent in a recursive manner. The number of levels will be log P to base k. For each parent, there will be a node, eventually climbing up to the true root or the parent. for e.g. for the 8 thread problem, there will be 4 + 2 + 1 = 7 nodes.
How do I get "my group's leaf in the combining tree"?

This is a little tricky part. There is a formula based on modulus and some integer division. However, I have not seen a publicly available implementation. sorry, I can't reveal what I have seen only in a class, as that may not be appropriate. May be a google search or someone else can fill in this.