c++multithreading thread-local c++-coroutine

`thread_local` variables and coroutines

Before coroutines we used callbacks to run asynchronous operations. Callbacks was normal functions and could have thread_local variables.

Let see this example:

void StartTcpConnection(void) 
{
    using namespace std;
    thread_local int my_thread_local = 1;
    cout << "my_thread_local = " << my_thread_local << endl;
    auto tcp_connection = tcp_connect("127.0.0.1", 8080);
    tcp_connection.async_wait(TcpConnected);
}

void TcpConnected(void)
{
    using namespace std;
    thread_local int my_thread_local = 2;
    cout << "my_thread_local = " << my_thread_local << endl;
}

As we see from code, I have some (undocumented here) tcp_connect function that connects to TCP endpoint and returns tcp_connection object. This object can wait until TCP connection will really occur and call TcpConnected function. Because we don't know specific implementation of tcp_connect and tcp_connection, we don't know will it call TcpConnected on the same or on different thread, both implementations are possible. But we know for sure that my_thread_local is different for 2 different functions, because each function has its own scope.

If we need this variable to be the same (as soon as thread is the same), we can create 3rd function that will return reference to thread_local variable:

int& GetThreadLocalInt(void)
{
    thread_local int my_variable = 1;
    return my_variable;
}

So, we have full control and predictability: we know for sure that variables will be different if TcpConnected and StartTcpConnection will run on different threads, and we know that we can have them different or the same depending on our choice when these functions will run on the same thread.

Now let see coroutine version of the same operation:

void Tcp(void)
{
    thread_local int my_thread_local = 1;
    auto tcp_connection = co_await tcp_connect("127.0.0.1", 8080);
    cout << "my_thread_local = " << my_thread_local << endl;
}

This situation is a bit questionable for me. I still need thread local storage, it is important language feature that I don't want to abandon. However, we here have 2 cases:

Thread before co_await is the same one as after co_await. What will happen with my_thread_local? Will it be the same variable before and after co_await, especially if we'll use GetThreadLocalInt function to get its reference instead of value?
Thread changes after co_await. Will C++ runtime reinitialize my_thread_local to value from new thread, or make a copy of previous thread value, or may be use reference to the same data? And similar question for GetThreadLocalInt function, it returns reference to thread_local object, but the reference storage itself is auto, will coroutine reinitialize it to new thread, or we'll get (dangerous!!!) race condition, because thread 2 will strangely get reference to thread 1 thread local data and potentially use it in parallel?

Even it is easy to debug and test what will happen on any specific compiler, the important question is whether standard says us something about that, otherwise even if we'll test it on VC++ or gcc an see that it behaves somehow on these 2 popular compilers, the code may loose portability and compile differently on some exotic compilers.

Solution

For global thread_local variables, the coroutine behavior ought to be as expected (and MSVC seems to have a bug with this). But function-local thread_local variables in coroutines, there seems to be a hole in the specification. Indeed, I'm not sure the wording makes sense even without coroutines.

[stmt.dcl]/3 says:

Dynamic initialization of a block variable with static storage duration or thread storage duration is performed the first time control passes through its declaration; such a variable is considered initialized upon the completion of its initialization.

The problem is that, while by C++'s rules there is only one thread_local variable, there are multiple objects represented by that one variable. And those objects need to be initialized. So... how does that happen?

The only sane interpretation of this is that the object for a particular thread gets initialized the first time control flow passes through the declaration on that thread. And that's the problem.

If using co_await the thread a function executes on changes, then how could you pass through the declaration on the new thread? Which means that the thread_local for that thread should be zero-initialized.

Ultimately, I would say that you should never use thread_local in a coroutine function. It's just not clear what value the thread_local ought to have. And the only logical value for the new thread_local to have is the one from the previous thread. But that's not how a thread_local is supposed to work. Overall, the idea feels inherently nonsensical (and I would say that the standard should have explicitly forbid the declaration of thread_locals in a coroutine, just as they did for using co_await in a thread_local's initializer).

Just use a namespace-scoped thread_local variable. Outside of the aforementioned MSVC bug, it ought to work and make sense.