c++multithreading boost thread-local-storage

thread_local data for non static data members, once more

Problem

c++11 has introduced thread_local which provides thread local data, but cannot be used with non static data members.

This leads to the question:
Why may thread_local not be applied to non-static data members and how to implement thread-local non-static data members?

And the answer:

... I suggest using boost::thread_specific_ptr. You can put one of these inside your class and get the behavior you want.

But the boost::thread_specific_ptr destructor has the following note attached:

All the thread specific instances associated to this thread_specific_ptr (except maybe the one associated to this thread) must be null.

Is there a way to work around this?
I need thread local storage for non-static data members, which will free all thread-data on destruction, even if there are still Threads running (or a tls which at least doesn't fail/leak on destruction).
If boost::thread_specific_ptr is not the right choice for this, could I use a mutex protected std::vector instead?

Background

I have a threadsafe class which receives data from mongodb.

class JsonTable
{
    public:
    std::string getData() const;
    //....
    private:
    ThreadLocalStorage<mongocxx::client> _clients;
    //....
};

mongocxx::client must not be shared across multiple threads. Thus in order to make getData Thread-safe, I need to construct a mongocxx::client per Thread. And when my JsonTable class is destructed I would like all clients to be closed/destructed even if the threads which initially created them are still running.

Solution

Make a thread local static map from ptr-to-this to wrapper around shared-ptr to shared-ptr to-data.

Make a non-static synchronized list of shared-ptr to shared-ptr to-data.

Populate the thread local map on demand. When populated, add it to the instance list.

At object destruction, use atomic shared ptr operations to clear the inner shared_ptr from all elements of the list. This deletes the thread-local per-instance data.

The wrapper around the double shared ptr also uses atomic operations to clear the inner shared ptr. This clears the data if the thread dies.

Both the instance and the thread together a shared ptr (the inner one) whose lifetime is managed by the outer shared ptr (so it dies when both thread and instance go away).

The only synchronization occurs when the object is destroyed, or a new thread accesses the object, which should keep performance solid;

The map entries (with empty data) outlast the instance. If you want, you can spend some effort cleaning them up periodically somehow. It could be a problem if you have many transient instsnces ineteracting with many long-lasting threads. Add an atomic counter of in-use vs cleared data, and when it gets high do a pass to remove the cleared entries when you add a new entry.