c++multithreading c++11 c++17 api-design

Sharing data between API and application threads

My API computes some data in its own thread:

/*** API is running in its own thread ***/
class API {
public:
    std::shared_ptr<Data> retrieveData() { return mData; }

private:
    std::shared_ptr<Data> mData;
    std::mutex mDataMutex;

    void run () {
        std::thread t([](){
            while (!exitApi) {
                mDataMutex.lock();

                updateData(mData);

                mDataMutex.unlock();
        });
        t.join();
    }
};

An application that uses my API will retrieve the shared data in another thread:

/*** Application is running in another thread ***/
class Application {
private:
    Api mApi;

    void run () {
        std::thread t([](){
            while (!exitApp) {
                std::shared_ptr<Data> data = mApi.retrieveData();

                /* API thread can update the data while the App is using it! */
                useData(data);
        });
        t.join();
    }

How can I design my API so that there are no pitfalls for the application-developer when retrieving the data? I can think of three options but do not like any of them:

Instead of sharing the pointer, the API will return a copy of all the data. However, the amount of data can get quite large and copying it should be avoided.
The API will lock the data when it hands it over to the application and the application needs to explicitly ask the API to unlock it again after performing all computations. Even if documented correctly this is very much prone to deadlocks.
When the API hands over the data to the application retrieveData will also return an already locked std::unique_lock. Once the application is done with using the data it has to unlock the unique_lock. This is potentially less prone to error but still not very obvious for the application developer.

Are there any better options to design the API (in modern C++11 and beyond) that is as developer-friendly as possible?

Solution

TL;DR: Use shared_ptr with a custom deleter that calls unlock.

It think the two main approaches are:

Returning an immutable data structure so it can be shared between threads. This is makes for a clean API, but (as already mentioned) copying could be expensive. Some approaches to reduce the need for copying would be:
- Use a copy-on-write data structure so that only some portions of the data need to be copied each time. Depending on your data this may be not be possible, or it will be too much work to refactor.
- Make use of move-references where possible to reduce the cost of copying. This alone probably won't be enough, but depends on your actual data.
Using locks around a mutable data structure. As is pointed out, this requires the API user to perform extra actions that may not be obvious. But smart pointers can be used to lessen the burden on the consumers:
- One way to make it easier is to return a custom smart pointer that unlocks in its destructor: Unlock happens when the caller's scope closes, so there are no unlock calls for the caller to worry about. The API consumer would pass the pointer by reference to its methods. For example func(locking_ptr& ptr). A simple implementation can be found here: https://stackoverflow.com/a/15876719/1617480.
- To be able to pass that locking smart pointer by copy instead of by reference, some sort of reference counting scheme needs to be in place. You'd probably want to use shared_ptr internal to the locking smart pointer to avoid rolling your own thread-safe reference counting. More simply pass a custom deleter to shared_ptr that unlocks and deletes (No need to write a smart pointer at all).
- Another type of smart pointer would surround every dereference -> in locks. I don't think this is appropriate for this use case, since it looks the API consumer wants a consistent view of the results. Here's an example of such a pointer: https://en.wikibooks.org/wiki/More_C%2B%2B_Idioms/Execute-Around_Pointer