Search code examples

Per-thread singleton-like using thread_local storage

Is there any caveats of this usage of thread_local storage duration:

template <class T>
inline T &thread_local_get()
  thread_local T t;
  return t;

Then in different threads (for example)

thread_local_get<float>() += 1.f;

The doc at cppreference says this about thread local storage duration:

thread storage duration. The object is allocated when the thread begins and deallocated when the thread ends. Each thread has its own instance of the object. Only objects declared thread_local have this storage duration. thread_local can appear together with static or extern to adjust linkage.

Does this correctly allocate one thread_local instance for each T (during compilation) and each calling thread ? Is there any situation that can lead to e.g undefined behavior ?


  • I don't see theoretical caveats, as after the instantiation(s) the template should behave (from the point of view of the compiler) exactly like a normal function.

    Still, I would recommend checking your compiler support for thread_local before using it: for example gcc had a bug with class static thread_local members which seems to be still present at least in the latest TDM-GCC distribution featuring gcc 5.1.0. I don't know if this particular bug also affects static members of functions (it should not) and probably you are using a different compiler, but still my suggestion is to make some experiments before using this feature.