c++multithreading c++11 benchmarking stdthread

C++ threads: shared memory not updated despite absence of race

I'm experimenting with C++ standard threads. I wrote a small benchmark to test performance overhead and overall throughput. The principle it to run in one or several threads a loop of 1 billion iterations, making small pause from time to time.

In a first version I used counters in shared memory (i.e. normal variables). I exepected the following output:

Sequential      1e+009 loops    4703 ms 212630 loops/ms
2 thrds:t1      1e+009 loops    4734 ms 211238 loops/ms
2 thrds:t2      1e+009 loops    4734 ms 211238 loops/ms
2 thrds:tt      2e+009 loops    4734 ms 422476 loops/ms
manythrd tn     1e+009 loops    7094 ms 140964 loops/ms
...  
manythrd tt     6e+009 loops    7094 ms 845785 loops/ms

Unfortunately the display showed some counters as if they were uninitialised !

I could solve the issue by storing the end value of each counter in an atomic<> for later display. However I do not understand why the version based on simple shared memory does not work properly: each thread uses its own counter, so there is no racing condition. Even the display thread accesses the counters only after the counting threads are finished. Using volatile did not help either.

Could anyone explain me this strange behaviour (as if memory was not updated) and tell me if I missed something ?

Here the shared variables:

const int maxthread = 6;
atomic<bool> other_finished = false;
atomic<long> acounter[maxthread];

Here the code of the threaded function:

void foo(long& count, int ic, long maxcount)   
{
    count = 0;  
    while (count < maxcount) {
        count++;
        if (count % 10000000 == 0)
            this_thread::sleep_for(chrono::microseconds(1));
    }
    other_finished = true;      // atomic: announce work is finished
    acounter[ic] = count;       // atomic: share result 
}

Here an example of how I call benchmark the threads:

mytimer.on();                 // second run, two threadeds
thread t1(foo, counter[0], 0, maxcount);  // additional thread
foo(counter[1], 1, maxcount);         // main thread
t1.join();                    // wait end of additional thread
perf = mytimer.off();     
display_perf("2 thrds:t1", counter[0], perf);  // non atomic version of code
display_perf("2 thrds:t2", counter[1], perf);
display_perf("2 thrds:tt", counter[0] + counter[1], perf);

Solution

Here is a simplified version to reproduce the problem:

void deep_thought(int& value) { value = 6 * 9; }

int main()
{
    int answer = 42;
    std::thread{deep_thought, answer).join();
    return answer; // 42
}

It looks like passing a reference to answer to the worker function, and assigning 6 * 9 to the reference and therefore to answer. However, the constructor of std::thread makes a copy of answer and passes a reference to the copy to the worker function, and the variable answer in the main thread is never changed.

Both GCC-4.9 and Clang-3.5 reject the above code, because the worker function can not be invoked with a lvalue reference. You can solve the problem by passing the variable with std::ref:

    std::thread{deep_thought, std::ref(answer)}.join();