Minimal example of C++ concurrency bug due to one thread loading the value at thread startup

At 6:44 of the talk "Real-time Confessions in C++", the speaker shows buggy code that essentially boils down to:

Thread A writes to an unsynchronized integer called x
Much later, thread B reads x (in a loop)
Thread B does not see the updated value of x because the optimizing compiler realized that thread B never changed x, and thus the value of x could be loaded once at the start of the thread and never again.

The speaker is trying to prove that synchronization is necessary even on, say, a single core machine, since otherwise the compiler optimizations can render your code buggy.

I am trying to come up with a minimal example of this bug. Here is what I have:

#include <string>
#include <iostream>
#include <thread>
#include <unistd.h>

int var;  // unsynchronized, i.e., no std::atomic or anything

void f() {
    for (int i = 0; i < 10; i++) {
        sleep(1);
        int x = var * var;
        std::cout << x << std::endl;
    }
}

int main() {
    var = 1;

    std::thread t(f);
  
    sleep(2);
    var = 10;

    t.join();
}

My hope is that inside f(), the compiler will realize that it can load var and compute x once at the beginning of the thread, instead of loading var over and over in the loop, thereby never seeing that the value of var was changed from 1 to 10 on the main thread.

Unfortunately, the program behaves "correctly" as shown:

$ clang++ -O3 test.cpp && ./a.out
1
100
100
100
...

I was hoping for the output to be all 1s.

What am I doing wrong? How do I properly create a minimal example of this concurrency bug?

Solution

I would never have assumed that I'd write something like this, but here's a "fixed" example that, well, "breaks properly". At least on my box.

#include <iostream>
#include <thread>
#include <unistd.h>

bool test=false;  // unsynchronized, i.e., no std::atomic or anything
volatile bool quit=false;

void f() {
    std::cout << "thread: " << test << "\n";
    unsigned long long count=0;
    while (!quit) {
        count++;
        if (test) {
            std::cout << "changed\n";
            break;
        }
    }
    std::cout << "loops: " << count << "\n";
    std::cout << "thread: " << test << "\n";
}

int main() {
    std::thread t(f);

    sleep(1);
    test=true;
    std::cout << "main: " << test << "\n";

    sleep(1);
    quit=true;

    t.join();
}

If we run this without optimization, I get this:

stieber@gatekeeper:~ $ g++ Test.cpp; ./a.out
thread: 0
changed
loops: 150478526
thread: 1
main: 1

So, we can properly detect the change mid-thread.

Now, let's try this with optimization:

stieber@gatekeeper:~ $ g++ -O3 Test.cpp; ./a.out
thread: 0
main: 1
loops: 1494121136
thread: 1

As you can see, the "changed" message never appears, because the compiler optimized it away.

In contrast to your code, I avoided all calls to functions that the compiler can't see. If you call something "unknown", the compiler generally assumes that global variables could be affected -- it doesn't really care whether you're calling a standardized library function that would allow him to assume that globals aren't changed.

If you declare the test as volatile, the output changes yet again:

stieber@gatekeeper:~ $ g++ -O3 Test.cpp; ./a.out
thread: 0
main: changed
loops: 481792699
thread: 1
1

Note how the output gets mangled because I'm not syncing the std::cout; however, you can still spot the changed output now.

In case you haven't encountered it before, volatile is an ancient keyword that predates modern civilization; it was used to indicate that something could just "change by itself". While it was meant to access things like hardware registers, I was just abusing it here to force the compiler to actually keep looking at the quit variable.