I have a background thread for uploading files. It runs in a loop; it does some work, then sleeps until a timeout elapses or until it is explicitly notified via a condition variable that there is more work to do. The problem is that sometimes I'm unable to get the thread to exit quickly.
Here is a simplified version:
std::thread g_thread;
std::mutex g_mutex;
std::condition_variable g_cond;
bool g_stop = false;
void threadLoop()
{
while (!g_stop)
{
printf("doing some stuff\n");
std::unique_lock<std::mutex> lock(g_mutex);
g_cond.wait_for(lock, std::chrono::seconds(15));
}
}
int main(int argc, char* argv[])
{
g_stop = false;
g_thread = std::thread(threadLoop);
printf("hello\n");
g_stop = true;
g_cond.notify_one();
g_thread.join();
}
When I run this test program, I would expect it to exit quickly, but sometimes it gets stuck in the wait_for(). I think maybe the notify_one() is happening before the thread is sleeping in the wait_for(), but after the check of g_stop.
Is there a simple solution to this, or another design pattern that would work better?
You are reading and writing the g_stop
variable without any synchronisation (such as using atomic operations, or using a mutex to serialize accesses to it). That is a data race, which is undefined behaviour.
Because you don't access it safely the compiler is allowed to assume that no other thread ever modifies g_stop
, so in the threadLoop
function it can load it into a register once and then never read the variable again, but just keep looping.
To ensure the write to the variable is seen by the looping thread you should use std::atomic<bool>
or lock the mutex before all reads/writes to that variable. If you use an atomic<bool>
that will fix the undefined behaviour, but doesn't ensure the thread will not wait on the condition variable, because as you suggest there is a window between checking the value of g_stop
and going to sleep, in which the main thread can set g_stop = true
and signal the condvar, so the looping thread doesn't wait until after the notify_one() call, and so misses it.
This slightly changed version will ensure that the thread will not wait on the condition variable if the main thread has told it to stop:
std::thread g_thread;
std::mutex g_mutex;
std::condition_variable g_cond;
bool g_stop = false;
void threadLoop()
{
std::unique_lock<std::mutex> lock(g_mutex);
while (!g_stop)
{
printf("doing some stuff\n");
g_cond.wait_for(lock, std::chrono::seconds(15));
}
}
int main(int argc, char* argv[])
{
g_stop = false;
g_thread = std::thread(threadLoop);
printf("hello\n");
{
std::lock_guard<std::mutex> lock(g_mutex);
g_stop = true;
}
g_cond.notify_one();
g_thread.join();
}
This works because the looping thread holds a lock on the mutex while it checks g_stop
, and it keeps holding that lock until it starts to wait on the condvar. The main thread takes the lock to set g_stop = true
, which it can only do while the other thread is waiting.
This means there are only two possible executions now. g_stop = true
happens while the thread is waiting on the condvar, and either it wakes up just before the notify_one() call or it wakes up because of the notify_one() call, but in both cases it will see g_stop == true
immediately and stop looping.