Search code examples
c++thread-local-storage

The Cost of thread_local


Now that C++ is adding thread_local storage as a language feature, I'm wondering a few things:

  1. What is the cost of thead_local likely to be?
    • In memory?
    • For read and write operations?
  2. Associated with that: how do Operating Systems usually implement this? It would seem like anything declared thread_local would have to be given thread-specific storage space for each thread created.

Solution

  • Storage space: size of the variable * number of threads, or possibly (sizeof(var) + sizeof(var*)) * number of threads.

    There are two basic ways of implementing thread-local storage:

    1. Using some sort of system call that gets information about the current kernel thread. Sloooow.

    2. Using some pointer, probably in a processor register, that is set properly at every thread context switch by the kernel - at the same time as all the other registers. Cheap.

    On intel platforms, variant 2 is usually implemented via some segment register (FS or GS, I don't remember). Both GCC and MSVC support this. Access times are therefore about as fast as for global variables.

    It is also possible, but I haven't seen it yet in practice, for this to be implemented via existing library functions like pthread_getspecific. Performance would then be like 1. or 2., plus library call overhead. Keep in mind that variant 2. + library call overhead is still a lot faster than a kernel call.