I was playing around with std::thread
and something weird popped up:
#include <thread>
int k = 0;
int main() {
std::thread t1([]() { while (k < 1000000000) { k = k + 1; }});
std::thread t2([]() { while (k < 1000000000) { k = k + 1; }});
t1.join();
t2.join();
return 0;
}
When compiling the above code with no optimizations using clang++, I got the following benchmarks:
real 0m2.377s
user 0m4.688s
sys 0m0.005s
I then changed my code to the following: (Now using only 1 thread)
#include <thread>
int k = 0;
int main() {
std::thread t1([]() { while (k < 1000000000) { k = k + 1; }});
t1.join();
return 0;
}
And these were the new benchmarks:
real 0m2.304s
user 0m2.298s
sys 0m0.003s
Why is the code utilizing 2 threads slower than the code utilizing 1?
You have two threads fighting over the same variable, k
. So you are spending time where the processors say "Processor 1: Hey, do you know what value k
has? Processor 2: Sure, here you go!", ping-ponging back and forth every few updates. Since k
isn't atomic, there's also no guarantee that thread2 doesn't write an "old" value of k
so that next time thread 1 reads the value, it jumps back 1, 2, 10 or 100 steps, and has to do it over again - in theory that could lead to neither of the loops every finishing, but that would require quite a bit of bad luck.