Search code examples
c++multithreadingtbb

TLS enumerable_thread_specific in TBB


I've been told that enumerable_thread_specific will improve thread performance but I don't understand why. What is the benefit of using enumerable_thread_specific from the Intel Thread Building Block (TBB) library?

The documentation (link) is somewhat hazy in the motivation but seems to indicate that its purpose is to lazily create items in a list in circumstances where you do not know the number of threads in advance as in the TBB documentation example in the link:

#include <cstdio>
#include <utility>

#include "tbb/task_scheduler_init.h"
#include "tbb/enumerable_thread_specific.h"
#include "tbb/parallel_for.h"
#include "tbb/blocked_range.h"

using namespace tbb;

typedef enumerable_thread_specific< std::pair<int,int> > CounterType;
CounterType MyCounters (std::make_pair(0,0));

struct Body {
     void operator()(const tbb::blocked_range<int> &r) const {
          CounterType::reference my_counter = MyCounters.local();
          ++my_counter.first;
          for (int i = r.begin(); i != r.end(); ++i)
              ++my_counter.second;
     }
};

int main() {
     parallel_for( blocked_range<int>(0, 100000000), Body());

     for (CounterType::const_iterator i = MyCounters.begin();
         i != MyCounters.end(); ++i)
     {
            printf("Thread stats:\n");
            printf("     calls to operator(): %d", i->first);
            printf("     total # of iterations executed: %d\n\n",
                 i->second);
    }
}

Is this really necessary and are there other benefits not being listed? It was indicated that there may be advantages for memory access across threads but it's not clear to me how that happens?


Solution

  • The idea of enumerable_thread_specific is to provide a container around the concept of TLS or thread_local in C++11 so that a value assigned by one thread can be later combined/enumerated in another thread. What actually contributes to the performance benefit is this common property of the aforementioned concepts.

    Generally, TLS allows to avoid contention between threads for the processor cache or a mutex which will occur otherwise for a shared global object. See this blog for more details and explanations for the similar container combinable<> which is also available in TBB.