I'm experimenting with C++ standard threads. I wrote a small benchmark to test performance overhead and overall throughput. The principle it to run in one or several threads a loop of 1 billion iterations, making small pause from time to time.
In a first version I used counters in shared memory (i.e. normal variables). I exepected the following output:
Sequential 1e+009 loops 4703 ms 212630 loops/ms 2 thrds:t1 1e+009 loops 4734 ms 211238 loops/ms 2 thrds:t2 1e+009 loops 4734 ms 211238 loops/ms 2 thrds:tt 2e+009 loops 4734 ms 422476 loops/ms manythrd tn 1e+009 loops 7094 ms 140964 loops/ms ... manythrd tt 6e+009 loops 7094 ms 845785 loops/ms
Unfortunately the display showed some counters as if they were uninitialised !
I could solve the issue by storing the end value of each counter in an atomic<>
for later display. However I do not understand why the version based on simple shared memory does not work properly: each thread uses its own counter, so there is no racing condition. Even the display thread accesses the counters only after the counting threads are finished. Using volatile
did not help either.
Could anyone explain me this strange behaviour (as if memory was not updated) and tell me if I missed something ?
Here the shared variables:
const int maxthread = 6;
atomic<bool> other_finished = false;
atomic<long> acounter[maxthread];
Here the code of the threaded function:
void foo(long& count, int ic, long maxcount)
{
count = 0;
while (count < maxcount) {
count++;
if (count % 10000000 == 0)
this_thread::sleep_for(chrono::microseconds(1));
}
other_finished = true; // atomic: announce work is finished
acounter[ic] = count; // atomic: share result
}
Here an example of how I call benchmark the threads:
mytimer.on(); // second run, two threadeds
thread t1(foo, counter[0], 0, maxcount); // additional thread
foo(counter[1], 1, maxcount); // main thread
t1.join(); // wait end of additional thread
perf = mytimer.off();
display_perf("2 thrds:t1", counter[0], perf); // non atomic version of code
display_perf("2 thrds:t2", counter[1], perf);
display_perf("2 thrds:tt", counter[0] + counter[1], perf);
Here is a simplified version to reproduce the problem:
void deep_thought(int& value) { value = 6 * 9; }
int main()
{
int answer = 42;
std::thread{deep_thought, answer).join();
return answer; // 42
}
It looks like passing a reference to answer
to the worker function, and assigning 6 * 9
to the reference and therefore to answer
. However, the constructor of std::thread
makes a copy of answer
and passes a reference to the copy to the worker function, and the variable answer
in the main thread is never changed.
Both GCC-4.9 and Clang-3.5 reject the above code, because the worker function can not be invoked with a lvalue reference. You can solve the problem by passing the variable with std::ref
:
std::thread{deep_thought, std::ref(answer)}.join();