Search code examples
c++tbb

How to make a threadpool in c++ TBB?


I might not be measuring this correctly but I have some simple code I'm playing with. I don't think its a threadpool because if I make the work unit very large then the cpu goes to 190-199% (I have dual core) but if I lower make the work unit smaller but much more of those units than my cpu runs the program at 140-160%.

What I think is going on is threads are not being pooled but are being destroyed/created when needed, which is making the program slower when the workloads are smaller. I'm using tbb::task_scheduler_init to control the number of threads but I don't know how to tell tbb to keep the thread alive.

Here's some code to illustrate the problem:

#include <iostream>
#include <list>
#include <tbb/task.h>
#include <tbb/task_group.h>
//#include <stdlib.h>
#include "tbb/task_scheduler_init.h"
#include <boost/thread.hpp>

using namespace tbb;


long fib(long a)
{
    if (a < 2) return 1;
    
    return fib(a - 1) + fib(a - 2);
}

class PrintTask
{
public:
    void operator()()
    {
        //std::cout << "hi world!: " <<  boost::this_thread::get_id() << std::endl;
        
        fib(10);
    }
};

int main()
{
    tbb::task_scheduler_init init(4); //creates 4 threads
    task_group group;
    
    
    for (int i = 0; i < 2000000; ++i)
    {
        group.run(PrintTask());
        //std::cout << i << std::endl;
    }
    
    std::cout << "done" << std::endl;
    group.wait();
    
    return(0);
}

If you change fib to 40-45, the work for each thread becomes large so it hits full speed, but if you use the current setup then the jobs are very small but it does many of them.

note: A thing I noticed that maybe related is that in the above case, it completely uses my memory (4 gigs free). Could the slowdown be related to that? Also why would this program take all the memory? What is it storing it in memory, if I'm just calling the thread isn't there a queue telling it how many times to run or is it saving the entire thread in memory to run?

I read the tutorial but am confused at its behavior still (although I do get the intended outcome).


Solution

  • You shouldn't be worried so much about CPU utilization but rather look at execution time and speedup vs. sequential.

    There's a few things that you need to understand about tbb:

    You can think of the overhead of scheduling a task as being basically constant (it isn't quite but it's close enough). The smaller the amount of work you schedule the closer it comes to approaching this constant. If you get close to this constant you won't see speedups.

    Also threads go idle when they can't find work. I'm guessing that when you're calling 'fib(10)' that the cost of calling run and the necessary task allocations are approaching the cost of actually executing the work.

    More specifically, if you really do have 2000000 items with the same task you should probably be calling parallel_for or parallel_for_each.

    -Rick