__thread Foo foo;
How is foo
actually resolved? Does the compiler silently replace every instance of foo
with a function call? Is foo
stored somewhere relative to the bottom of the stack, and the compiler stores this as "hey, for each thread, have this space near the bottom of the stack, and foo is stored as 'offset x from bottom of stack'"?
It's a little complicated (this document explains it in great detail), but it's basically neither. Instead the compiler puts a special .tdata section in the executable, which contains all the thread-local variables. At runtime, a new data section for each thread is created with a copy of the data in the (read-only) .tdata section, and when threads are switched at runtime, the section is also switched automatically.
The end result is that __thread variables are just as fast as regular variables, and they don't take up extra stack space, either.