Search code examples
c++c++14mutexcondition-variablestdatomic

Avoiding deadlock in concurrent waiting object


I've implemented a "Ticket" class which is shared as a shared_ptr between multiple threads.

The program flow is like this:

  1. parallelQuery() is called to start a new query job. A shared instance of Ticket is created.
  2. The query is split into multiple tasks, each task is enqueued on a worker thread (this part is important, otherwise I'd just join threads and done). Each task gets the shared ticket.
  3. ticket.wait() is called to wait for all tasks of the job to complete.
  4. When one task is done it calls the done() method on the ticket.
  5. When all tasks are done the ticket is unlocked, result data from the task aggregated and returned from parallelQuery()

In pseudo code:

     std::vector<T> parallelQuery(std::string str) {
         auto ticket = std::make_shared<Ticket>(2);
         auto task1 = std::make_unique<Query>(ticket, str+"a");
         addTaskToWorker(task1);
         auto task2 = std::make_unique<Query>(ticket, str+"b");
         addTaskToWorker(task2);
         ticket->waitUntilDone();
         auto result = aggregateData(task1, task2);
         return result;
     }

My code works. But I wonder if it is theoretically possible that it can lead to a deadlock in case when unlocking the mutex is executed right before it gets locked again by the waiter thread calling waitUntilDone().

Is this a possibility, and how to avoid this trap?

Here is the complete Ticket class, note the execution order example comments related to the problem description above:

#include <mutex>
#include <atomic>

    class Ticket {
    public:
        Ticket(int numTasks = 1) : _numTasks(numTasks), _done(0), _canceled(false) {
            _mutex.lock();
        }

        void waitUntilDone() {
            _doneLock.lock();
            if (_done != _numTasks) {
                _doneLock.unlock(); // Execution order 1: "waiter" thread is here
                _mutex.lock(); // Execution order 3: "waiter" thread is now in a dealock?
            }
            else {
                _doneLock.unlock();
            }
        }

        void done() {
            _doneLock.lock();
            _done++;
            if (_done == _numTasks) {
                _mutex.unlock(); // Execution order 2: "task1" thread unlocks the mutex
            }
            _doneLock.unlock();
        }

        void cancel() {
            _canceled = true;
            _mutex.unlock();
        }

        bool wasCanceled() {
            return _canceled;
        }

        bool isDone() {
            return _done >= _numTasks;
        }

        int getNumTasks() {
            return _numTasks;
        }

    private:
        std::atomic<int> _numTasks;
        std::atomic<int> _done;
        std::atomic<bool> _canceled;
        // mutex used for caller wait state
        std::mutex _mutex;
        // mutex used to safeguard done counter with lock condition in waitUntilDone
        std::mutex _doneLock;
    };

One possible solution which just came to my mind when editing the question is that I can put _done++; before the _doneLock(). Eventually, this should be enough?

Update

I've updated the Ticket class based on the suggestions provided by Tomer and Phil1970. Does the following implementation avoid mentioned pitfalls?

class Ticket {
public:
    Ticket(int numTasks = 1) : _numTasks(numTasks), _done(0), _canceled(false) { }

    void waitUntilDone() {
        std::unique_lock<std::mutex> lock(_mutex);
        // loop to avoid spurious wakeups
        while (_done != _numTasks && !_canceled) {
            _condVar.wait(lock);
        }
    }

    void done() {
        std::unique_lock<std::mutex> lock(_mutex);
        // just bail out in case we call done more often than needed
        if (_done == _numTasks) {
            return;
        }
        _done++;
        _condVar.notify_one();
    }

    void cancel() {
        std::unique_lock<std::mutex> lock(_mutex);
        _canceled = true;
        _condVar.notify_one();
    }

    const bool wasCanceled() const {
        return _canceled;
    }

    const bool isDone() const {
        return _done >= _numTasks;
    }

    const int getNumTasks() const {
        return _numTasks;
    }

private:
    std::atomic<int> _numTasks;
    std::atomic<int> _done;
    std::atomic<bool> _canceled;
    std::mutex _mutex;
    std::condition_variable _condVar;
};

Solution

  • Don't write your own wait methods but use std::condition_variable instead.

    https://en.cppreference.com/w/cpp/thread/condition_variable.

    Mutexes usage

    Generally, a mutex should protect a given region of code. That is, it should lock, do its work and unlock. In your class, you have multiple method where some lock _mutex while other unlock it. This is very error-prone as if you call the method in the wrong order, you might well be in an inconsistant state. What happen if a mutex is lock twice? or unlocked when already unlocked?

    The other thing to be aware with mutex is that if you have multiple mutexes, it that you can easily have deadlock if you need to lock both mutexes but don't do it in consistant order. Suppose that thread A lock mutex 1 first and the mutex 2, and thread B lock them in the opposite order (mutex 2 first). There is a possibility that something like this occurs:

    • Thread A lock mutex 1
    • Thread B lock mutex 2
    • Thread A want to lock mutex 2 but cannot as it is already locked.
    • Thread B want to lock mutex 1 but cannot as it is already locked.
    • Both thread will wait forever

    So in your code, you should at least have some checks to ensure proper usage. For example, you should verify _canceled before unlocking the mutex to ensure cancel is called only once.

    Solution

    I will just gave some ideas

    Declare a mutux and a condition_variable to manage the done condition in your class.

    std::mutex doneMutex;
    std::condition_variable done_condition;
    

    Then waitUntilDone would look like:

    void waitUntilDone()
    {
        std::unique_lock<std::mutex> lk(doneMutex);
        done_condition.wait(lk, []{ return isDone() || wasCancelled();});
    }
    

    And done function would look like:

    void done() 
    {
        std::lock_guard<std::mutex> lk(doneMutex);
        _done++;
        if (_done == _numTasks) 
        {
            doneCondition.notify_one();
        }
    }
    

    And cancel function would become

    void done() 
    {
        std::lock_guard<std::mutex> lk(doneMutex);
        _cancelled = true;
        doneCondition.notify_one();
    }
    

    As you can see, you only have one mutex now so you basically eliminate the possibility of a deadlock.

    Variable naming

    I suggest you to not use lock in the name of you mutex since it is confusing.

    std::mutex someMutex;
    std::guard_lock<std::mutex> someLock(someMutex); // std::unique_lock when needed
    

    That way, it is far easier to know which variable refer to the mutex and which one to the lock of the mutex.

    Good reading

    If you are serious about multithreading, then you should buy that book:

    C++ Concurrency in Action
    Practical Multithreading
    Anthony Williams

    Code Review (added section)

    Essentially same code has beed posted to CODE REVIEW: https://codereview.stackexchange.com/questions/225863/multithreading-ticket-class-to-wait-for-parallel-task-completion/225901#225901.

    I have put an answer there that include some extra points.