c++c++11 parallel-processing std stdthread

Parallelize a loop using std::thread and good practices

Possible Duplicate:
C++ 2011 : std::thread : simple example to parallelize a loop?

Consider the following program that distribute a computation over the elements of a vector (I never used std::thread before):

// vectorop.cpp
// compilation: g++ -O3 -std=c++0x vectorop.cpp -o vectorop -lpthread
// execution: time ./vectorop 100 50000000 
// (100: number of threads, 50000000: vector size)
#include <iostream>
#include <iomanip>
#include <cstdio>
#include <vector>
#include <thread>
#include <cmath>
#include <algorithm>
#include <numeric>

// Some calculation that takes some time
template<typename T> 
void f(std::vector<T>& v, unsigned int first, unsigned int last) {
    for (unsigned int i = first; i < last; ++i) {
        v[i] = std::sin(v[i])+std::exp(std::cos(v[i]))/std::exp(std::sin(v[i])); 
    }
}

// Main
int main(int argc, char* argv[]) {

    // Variables
    const int nthreads = (argc > 1) ? std::atol(argv[1]) : (1);
    const int n = (argc > 2) ? std::atol(argv[2]) : (100000000);
    double x = 0;
    std::vector<std::thread> t;
    std::vector<double> v(n);

    // Initialization
    std::iota(v.begin(), v.end(), 0);

    // Start threads
    for (unsigned int i = 0; i < n; i += std::max(1, n/nthreads)) {
        // question 1: 
        // how to compute the first/last indexes attributed to each thread 
        // with a more "elegant" formula ?
        std::cout<<i<<" "<<std::min(i+std::max(1, n/nthreads), v.size())<<std::endl;
        t.push_back(std::thread(f<double>, std::ref(v), i, std::min(i+std::max(1, n/nthreads), v.size())));
    }

    // Finish threads
    for (unsigned int i = 0; i < t.size(); ++i) {
        t[i].join();
    }
    // question 2: 
    // how to be sure that all threads are finished here ?
    // how to "wait" for the end of all threads ?

    // Finalization
    for (unsigned int i = 0; i < n; ++i) {
        x += v[i];
    }
    std::cout<<std::setprecision(15)<<x<<std::endl;
    return 0;
}

There is already two questions embedded in the code.

A third one would be: is this code is completely ok or could it be written in a more elegant way using std::threads ? I do not know the "good practices" using std::thread...

Solution

On the first question, how to compute the ranges to compute for each thread: I extracted constants and gave them names, in order to make the code easier to read. For good practices I also used a lambda which makes the code easier to modify - code in the lambda will only ever be used here, while the function f can be used from other code throughout the program. Make use of this to put shared parts of the code in a function and specialized that are only ever used once in the lambda.

const size_t itemsPerThread = std::max(1, n/threads);
for (size_t nextIndex= 0; nextIndex< v.size(); nextIndex+= itemsPerThread)
{
    const size_t beginIndex = nextIndex;
    const size_t endIndex =std::min(nextIndex+itemsPerThread, v.size())
    std::cout << beginIndex << " " << endIndex << std::endl;
    t.push_back(std::thread([&v,beginIndex ,endItem]{f(v,beginIndex,endIndex);});
}

An advanced use case would make use of a thread pool, but how this will look depends on your application design and is not covered by the STL. For a good example of a threading model see the Qt Framework. If you're just getting started with threads save this for later.

The second question was already answered in the comments. The std::thread::join function will wait(block) until the thread has finished. By calling the join function on each thread and reaching the code after the join function, you can be sure that all there threads have finished and can now be deleted.