c++multithreading algorithm boost boost-thread

Safe multi-thread counter increment

For example, I've got a some work that is computed simultaneously by multiple threads.

For demonstration purposes the work is performed inside a while loop. In a single iteration each thread performs its own portion of the work, before the next iteration begins a counter should be incremented once.

My problem is that the counter is updated by each thread.

As this seems like a relatively simple thing to want to do, I presume there is a 'best practice' or common way to go about it?

Here is some sample code to illustrate the issue and help the discussion along. (Im using boost threads)

class someTask {
public:
    int mCounter; //initialized to 0
    int mTotal; //initialized to i.e. 100000
    boost::mutex cntmutex;                
    int getCount()
    {
            boost::mutex::scoped_lock lock( cntmutex );
            return mCount;
    }
    void process( int thread_id, int numThreads )
    {
        while ( getCount() < mTotal )
        {
            // The main task is performed here and is divided 
            // into sub-tasks based on the thread_id and numThreads

                            // Wait for all thread to get to this point

            cntmutex.lock();
            mCounter++;  // < ---- how to ensure this is only updated once?
            cntmutex.unlock();
        }
    }
};

Solution

The main problem I see here is that you reason at a too-low level. Therefore, I am going to present an alternative solution based on the new C++11 thread API.

The main idea is that you essentially have a schedule -> dispatch -> do -> collect -> loop routine. In your example you try to reason about all this within the do phase which is quite hard. Your pattern can be much more easily expressed using the opposite approach.

First we isolate the work to be done in its own routine:

void process_thread(size_t id, size_t numThreads) {
    // do something
}

Now, we can easily invoke this routine:

#include <future>
#include <thread>
#include <vector>

void process(size_t const total, size_t const numThreads) {
    for (size_t count = 0; count != total; ++count) {
         std::vector< std::future<void> > results;

         // Create all threads, launch the work!
         for (size_t id = 0; id != numThreads; ++id) {
             results.push_back(std::async(process_thread, id, numThreads));
         }

         // The destruction of `std::future`
         // requires waiting for the task to complete (*)
    }
}

(*) See this question.

You can read more about std::async here, and a short introduction is offered here (they appear to be somewhat contradictory on the effect of the launch policy, oh well). It is simpler here to let the implementation decides whether or not to create OS threads: it can adapt depending on the number of available cores.

Note how the code is simplified by removing shared state. Because the threads share nothing, we no longer have to worry about synchronization explicitly!