Search code examples
c++multithreadinggccassemblythread-local-storage

What is gcc doing here to run this code once per thread?


I just ran across this technique for running code once per thread. I don't know how it works at the lowest level though. Especially, what's fs pointing to? What does .zero 8 mean? Is there a reason the identifier is @tpoff?

int foo();

void bar()
{
    thread_local static auto _ = foo();
}

Output (with -O2):

bar():
        cmp     BYTE PTR fs:guard variable for bar()::_@tpoff, 0
        je      .L8
        ret
.L8:
        sub     rsp, 8
        call    foo()
        mov     BYTE PTR fs:guard variable for bar()::_@tpoff, 1
        add     rsp, 8
        ret
guard variable for bar()::_:
        .zero   8

Solution

  • The fs segment base is the address of thread-local storage (on x86-64 Linux at least).

    .zero 8 reserves 8 bytes of zeros (presumably in the BSS). Check the GAS manual: https://sourceware.org/binutils/docs/as/Zero.html, links in https://stackoverflow.com/tags/x86/info.

    @tpoff presumably means to address it relative to thread-local storage, probably stands for thread something offset, I don't know.


    The rest of it looks similar to what gcc normally does for static local variables that need a runtime initializer: a guard variable that it checks every time it enters the function, falling through in the already-initialized case.

    The 1-byte guard variable is in thread-local storage. The actual _ itself is optimized away because it's never read. Notice there's no store of eax after foo returns.

    BTW, _ is a weird (bad) choice for a variable name. Easy to miss it, and probably reserved for use by the implementation.


    It has a nice optimization here: normally (for non-thread-local static int var = foo();) if it finds the guard variable isn't already initialized, it needs a thread-safe way to make sure only one thread actually does the initialization (essentially taking a lock).

    But here each thread has its own guard variable (and should run foo() the first time regardless of what other threads are doing) so it doesn't need to call a run_once function to get mutual exclusion.

    (sorry for the short answer, I may expand this later with an example on https://godbolt.org/ of a non-thread-local static local variable. Or find an SO Q&A about it.)