Search code examples
c++multithreadingvolatilethread-localboost-context

Making thread_local variables fully volatile


I'm working on a runtime library that uses user-level context switching (using Boost::Context), and am having trouble using thread_level variables. Consider the following (reduced) code:

thread_local int* volatile tli;

int main()
{
    tli = new int(1);   // part 1, done by thread 1
    UserLevelContextSwitch();
    int li = *tli;      // part 2, done by thread 2
    cout << li;
}

Since there are two accesses to the thread_local variable, the main function is transformed by the compiler to something along these lines (reversed from assembly):

register int** ptli = &tli; // cache address of thread_local variable
*ptli = new int(1);
UserLevelContextSwitch();
int li = **ptli;
cout << li;

This seems to be a legal optimization, since the value of volatile tli is not being cached in a register. But the address of the volatile tli is in fact being cached, and not read from memory on part 2.

And that's the problem: after the user-level context switch, the thread that did part 1 goes somewhere else. Part 2 is then picked up by some other thread, which gets the previous stack and registers state. But now the thread that's executing part 2 reads the value of the tli that belongs to thread 1.

I'm trying to figure out a way to prevent the compiler from caching the thread-local variable's address, and volatile doesn't go deep enough. Is there any trick (preferably standard, possibly GCC-specific) to prevent the caching of the thread-local variables' addresses?


Solution

  • There is no way to pair user-level context switches with TLS. Even with atomics and full memory fence, caching address seems legitimate optimization since the thread_local variable is file-scope, static variable which cannot be moved as assumed by the compiler. (though, perhaps some compilers can still be sensitive to the compiler memory barriers like std::atomic_thread_fence and asm volatile ("" : : : "memory");)

    uses the same technique as you described to implement "continuation stealing" when a different thread can continue execution after the sync point. And they explicitly discourage usage of TLS in a Cilk program. Instead, they recommend using "hyperobjects" - a special feature of Cilk which substitutes TLS (and also provides serial/deterministic join semantics). See also Cilk developer presentation about thread_local and parallelism.

    Also, Windows provides FLS (Fiber Local Storage) as a TLS replacement when Fibers (the same lightweight context switches) are in use.