Why is std::shared_ptr::unique() deprecated?

What is the technical problem with std::shared_ptr::unique() that is the reason for its deprecation in C++17?

According to cppreference.com, std::shared_ptr::unique() is deprecated in C++17 as

this function is deprecated as of C++17 because use_count is only an approximation in multi-threaded environment.

~~I understand this to be true for use_count() > 1: While I'm holding a reference to it, someone else might simultaneously let go of his or create a new copy.~~

But if use_count() returns 1 (which is what I'm interested in when calling unique()) then there is no other thread that could change that value in a racy way, so I would expect that this should be safe:

if (myPtr && myPtr.unique()) { //Modify *myPtr }

Results from my own search:

I found this document: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0521r0.html which proposes the deprecation in response to C++17 CD comment CA 14, but I couldn't find said comment itself.

As an alternative, that paper proposed adding some notes including the following:

Note: When multiple threads can affect the return value of use_count(), the result should be treated as approximate. In particular, use_count() == 1 does not imply that accesses through a previously destroyed shared_ptr have in any sense completed. — end note

I understand that this might be the case for the way use_count() is currently specified (due to the lack of guaranteed synchronization), but why was the resolution not just to specify such synchronization and hence make the above pattern safe? If there was a fundamental limitation that wouldn't allow such synchronization (or make it forbiddingly costly), then how is it possible to correctly implement the destructor?

Update:

I overlooked the obvious case presented by @alexeykuzmin0 and @rubenvb, because so far I only used unique() on instances of shared_ptr that were not accessible to other threads themselves. So there was no danger that that particular instance would get copied in a racy way.

I still would be interested to hear what exactly CA 14 was about, because I believe that all my use cases for unique() would work as long as it is guaranteed to synchronize with whatever happens to different shared_ptr instances on other threads. So it still seems like a useful tool to me, but I might overlook something fundamental here.

To illustrate what I have in mind, consider the following:

class MemoryCache {
public:
    MemoryCache(size_t size)
        : _cache(size)
    {
        for (auto& ptr : _cache) {
            ptr = std::make_shared<std::array<uint8_t, 256>>();
        }
    }

    // the returned chunk of memory might be passed to a different thread(s),
    // but the function is never accessed from two threads at the same time
    std::shared_ptr<std::array<uint8_t,256>> getChunk()
    {
        auto it = std::find_if(_cache.begin(), _cache.end(), [](auto& ptr) { return ptr.unique(); });
        if (it != _cache.end()) {
            //memory is no longer used by previous user, so it can be given to someone else
            return *it;
        } else {
            return{};
        }
    }
private:
    std::vector<std::shared_ptr<std::array<uint8_t, 256>>> _cache;
};

Is there anything wrong with it (if unique() would actually synchronize with the destructors of other copies)?

Solution

I think that P0521R0 solves potentially data race by misusing shared_ptr as inter-thread synchronization. It says use_count() returns unreliable refcount value, and so, unique() member function will be useless when multithreading.

int main() {
  int result = 0;
  auto sp1 = std::make_shared<int>(0);  // refcount: 1

  // Start another thread
  std::thread another_thread([&result, sp2 = sp1]{  // refcount: 1 -> 2
    result = 42;  // [W] store to result
    // [D] expire sp2 scope, and refcount: 2 -> 1
  });

  // Do multithreading stuff:
  //   Other threads may concurrently increment/decrement refcounf.

  if (sp1.unique()) {      // [U] refcount == 1?
    assert(result == 42);  // [R] read from result
    // This [R] read action cause data race w.r.t [W] write action.
  }

  another_thread.join();
  // Side note: thread termination and join() member function
  // have happens-before relationship, so [W] happens-before [R]
  // and there is no data race on following read action.
  assert(result == 42);
}

The member function unique() does not have any synchronization effect and there're no happens-before relationship from [D] shared_ptr's destructor to [U] calling unique(). So we cannot expect relationship [W] ⇒ [D] ⇒ [U] ⇒ [R] and [W] ⇒ [R]. ('⇒' denotes happens-before relationship).

EDITED: I found two related LWG Issues; LWG2434. shared_ptr::use_count() is efficient, LWG2776. shared_ptr unique() and use_count(). It is just a speculation, but WG21 Committee gives priority to the existing implementation of C++ Standard Library, so they codify its behavior in C++1z.

LWG2434 quote (emphasis mine):

shared_ptr and weak_ptr have Notes that their use_count() might be inefficient. This is an attempt to acknowledge reflinked implementations (which can be used by Loki smart pointers, for example). However, there aren't any shared_ptr implementations that use reflinking, especially after C++11 recognized the existence of multithreading. Everyone uses atomic refcounts, so use_count() is just an atomic load.

LWG2776 quote (emphasis mine):

The removal of the "debug only" restriction for use_count() and unique() in shared_ptr by LWG 2434 introduced a bug. In order for unique() to produce a useful and reliable value, it needs a synchronize clause to ensure that prior accesses through another reference are visible to the successful caller of unique(). Many current implementations use a relaxed load, and do not provide this guarantee, since it's not stated in the standard. For debug/hint usage that was OK. Without it the specification is unclear and probably misleading.

[...]

I would prefer to specify use_count() as only providing an unreliable hint of the actual count (another way of saying debug only). Or deprecate it, as JF suggested. We can't make use_count() reliable without adding substantially more fencing. We really don't want someone waiting for use_count() == 2 to determine that another thread got that far. And unfortunately, I don't think we currently say anything to make it clear that's a mistake.

This would imply that use_count() normally uses memory_order_relaxed, and unique is neither specified nor implemented in terms of use_count().