Search code examples
c++concurrencymutexcondition-variable

Why do we need an empty std::lock_guard before doing condition variable notify?


I am currently studying Google's Filament job system. You can find the source code here. The part that confuses me is this requestExit() method:

void JobSystem::requestExit() noexcept {
    mExitRequested.store(true);

    { std::lock_guard<Mutex> lock(mLooperLock); }
    mLooperCondition.notify_all();

    { std::lock_guard<Mutex> lock(mWaiterLock); }
    mWaiterCondition.notify_all();
}

I am confused why we need to lock and unlock even though there is no action in between the lock and unlock. Are there any cases where this empty lock and unlock is necessary?


Solution

  • This is a bit of a hack. First, let's look at the code without that:

    mExitRequested.store(true);
    mLooperCondition.notify_all();
    

    There's a possible race condition here. Some other code might have noticed that mExitRequested was false and started waiting for mLooperCondition right after we called notify_all.

    The race would be:

    1. Other thread checks mExitRequested, it's false.
    2. We set mExitRequested to true.
    3. We call mLooperCondition.notify_all.
    4. Other thread waits for mLooperCondition.
    5. Oops. Waiting for notify that already happened.

    But in order to wait for a condition variable, you must hold the associated mutex. So that can only happen if some other thread held the mLooperLock mutex. In fact, step 4 would really be: "Other thread releases mLooperLock and waits for mLooperCondition.

    So, for this race to happen, it must happen precisely like this:

    1. Other thread acquires mLooperLock.
    2. Other thread checks mExitRequested, it's false.
    3. We set mExitRequested to true.
    4. We call mLooperCondition.notify_all.
    5. Other thread waits for mLooperCondition, releasing mLooperLock.
    6. Oops. Waiting for notify that already happened.

    So, if we change the code to:

    mExitRequested.store(true);
    { std::lock_guard<Mutex> lock(mLooperLock); }
    mLooperCondition.notify_all();
    

    That ensures that no other thread could check mExitRequested and see false and then wait for mLooperCondition. Because the other thread would have to hold the mLooperLock lock through the whole process, which can't happen since we acquired it in the middle of that process.

    Trying it again:

    1. Other thread acquires mLooperLock.
    2. Other thread checks mExitRequested, it's false.
    3. We set mExitRequested to true.
    4. By acquiring and releasing nLooperLock, we do not make any forward progress until the other thread releases mLooperLock.
    5. We call mLooperCondition.notify_all.

    Now, either the other thread blocks on the condition or it doesn't. If it doesn't, there's no problem. If it does, there's still no problem because the unlocking of mLooperLock is the condition variable's atomic "unlock and wait" operation, guaranteeing that it sees our notify.