c++multithreading compiler-optimization busy-waiting

Why is the compiler allowed to optimize out this busy waiting loop?

#include <iostream>
#include <thread>
#include <mutex>

int main()
{
    std::atomic<bool> ready = false;

    std::thread threadB = std::thread([&]() {
        while (!ready) {}

        printf("Hello from B\n");
    });

    std::this_thread::sleep_for(std::chrono::seconds(1));

    printf("Hello from A\n");

    ready = true;

    threadB.join();

    printf("Hello again from A\n");
}

This is an example from the CppCon talk https://www.youtube.com/watch?v=F6Ipn7gCOsY&ab_channel=CppCon (min 17)

The objective is to first print Hello from A then allow threadB to start. It is clear that busy waiting should be avoided because it uses a lot of CPU.

The author said that the while (!ready) {} loop can be optimized (by putting the value of ready into a register) by the compiler because the compiler sees that threadB never sleeps so ready could never be changed. But even if the thread never sleeps another thread could still change the value, right? There is no data race because ready is atomic. The author states that this code is UB. Can somebody explain why the compiler is allowed to do such an optimization?

Solution

The author admits in one of the comments below the video that he was wrong:

I had thought so, but it appears I was wrong; the compiler cannot hoist the atomic read out of the loop. The advice at @17:54 is still correct — you should still be very careful and beware of situations where the compiler might reorder or coalesce or eliminate atomic accesses in general — but this particular while-loop is NOT actually such a situation. For some (mostly theoretical) examples of how a compiler might optimize atomic access patterns, see JF Bastien's N4455 "No Sane Compiler Would Optimize Atomics" http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4455.html