c++multithreading boost parallel-processing boost-thread

Multiprocessor Boost::Thread? All threads running on one processor

I have a embarrassingly parallel problem that I want to execute on multiple processors. I had supposed that boost::thread would automatically send new threads to new processors, but all of them are executing on the same core as the parent process. Is it possible to get each thread to run on a different processor, or do I need something like MPI?

My suspicion is that boost::thread is simply not a multi-processor tool, that I'm asking it to do something it's not designed for.

EDIT: my question boils down to this: Why do all the threads execute on one processor? Is there a way to get boost::thread to send threads to different processors?

Here's the relevant sample of my code:

size_t lim=1000;
std::deque<int> vals(lim);
std::deque<boost::thread *> threads;
int i=0; 
std::deque<int>::iterator it = vals.begin();
for (; it!=sigma.end(); it++, i++) {
  threads.push_back(new boost::thread(doWork, it, i));
  while (threads.size() >= maxConcurrentThreads) {
    threads.front()->join();
    delete threads.front();
    threads.pop_front();
  }
}
while(threads.size()) {
  threads.front()->join();
  threads.pop_front();
}

As should be clear, doWork does some calculation using the parameter i and stores the result in vals. My idea was that setting maxConncurrentThreads to be equal to the number of cores available, and then each thread would use the core that was idle. I just need someone to confirm that boost::thread cannot be made to work in this way.

(I'd guess that there's a better way to limit the number of concurrent threads than using a queue; feel free to scold me for that as well.)

Here's the doWork function:

void doWork(std::deque<int>::iterator it, int i) {
  int ret=0;
  int size = 1000; // originally 1000, later changed to 10,000,000
  for (int j=i; j<i+size; j++) {
    ret+=j;
  }
  *it=ret;
  return;
}

EDIT: As Martin James suggested, the problem was that the doWork function was initially only 1000 int additions. With such a small job, scheduling the thread took longer than executing the thread, so only one processor was in use. Making the job longer (adding 10,000,000 ints) yielded the desired behavior. The point being: boost::thread will use multiple cores by default, but if your threads do less work than scheduling the thread then you won't see any benefit from multithreading.

Thanks to everyone for aiding my understanding in this.

Solution

You are always joining the first thread in the queue. If this thread is taking a long time it might be the only thread left. I guess what you want is to start a new thread once any thread has completed.

I don't know why you only get an effective concurrency level of only one though.

After having looked at the doWork function I think that it is doing so little work that it is taking less work than starting a thread in the first place. Try running it with more work (1000x).