Partial template specialization of std::atomic for smart pointers

Background

Since C++11, atomic operations on std::shared_ptr can be done via std::atomic_... methods found here, because the partial specialization as shown below is not possible:

std::atomic<std::shared_ptr<T>>

This is due to the fact that std::atomic only accepts TriviallyCopyable types, and std::shared_ptr (or std::weak_ptr) is not trivially copyable.

However, as of C++20, these methods have been deprecated, and got replaced by the partial template specialization of std::atomic for std::shared_ptr as described here.

Question

I am not sure of

Why std::atomic_... got replaced.
Techniques used to enable the partial template specialization of std::atomic for smart pointers.

Solution

Several proposals for atomic<shared_ptr> or something of that nature explain a variety of reasons. Of particular note is P0718, which tells us:

The C++ standard provides an API to access and manipulate specific shared_ptr objects atomically, i.e., without introducing data races when the same object is manipulated from multiple threads without further synchronization. This API is fragile and error-prone, as shared_ptr objects manipulated through this API are indistinguishable from other shared_ptr objects, yet subject to the restriction that they may be manipulated/accessed only through this API. In particular, you cannot dereference such a shared_ptr without first loading it into another shared_ptr object, and then dereferencing through the second object.

N4058 explains a performance issue with regard to how you have to go about implementing such a thing. Since shared_ptr is typically bigger than a single pointer in size, atomic access typically has to be implemented with a spinlock. So either every shared_ptr instance has a spinlock even if it never gets used atomically, or the implementation of those atomic functions has to have a lookaside table of spinlocks for individual objects. Or use a global spinlock.

None of these are problems if you have a type dedicated to being atomic.

atomic<shared_ptr> implementations can use the usual techniques for atomic<T> when T is too large to fit into a CPU atomic operation. They get to get around the TriviallyCopyable restriction by fiat: the standard requires that they exist and be atomic, so the implementation makes it so. C++ implementations don't have to play by the same rules as regular C++ programs.