atomic<> within a union as a performance hack

I am considering implementing a homebrew shared pointer as part of a garbage collector, in order to avoid the (probably slight) overhead from std::shared_ptr being internally atomic. Something loosely equivalent to:

template <typename T>
struct my_atomic_shared
{
  std::atomic<std::size_t> refcount;
  std::unique_ptr<T> value;
};

My brief hope was that small integers and atomic<small integers> would be equivalent, but dumping asm shows mov for the std::size_t vs xchgl for the atomic. I am now considering the following, possibly UB invoking, implementation:

template <typename T>
struct my_normal_shared
{
  std::size_t refcount;
  std::unique_ptr<T> value;
};

template <typename T>
struct my_shared
{
  // Surprisingly tedious constructor/destructor definition omitted
  enum {ATOMIC, NORMAL} tag;
  union
  {
    my_atomic_shared<T> atomic;
    my_normal_shared<T> normal;
  } value;
  void promote()
  {
    // pseudocode. Sequential consistency, calls should go via store/load
    if(tag == NORMAL)
    {
      T got = *(value.normal);
      *(value.atomic) = got;
      tag = ATOMIC;
    }
  }
};

The premise would be to create the non-atomic version by default and call promote() before passing an instance of my_shared<> to another thread.

I would like to know if there is a better way to achieve this performance hack. I'm also be interested to know if there's a reason why putting atomic<> variables in unions is doomed to failure.

edit: Adding less incoherent pseudocode. The type stored is more interesting than the type used for the reference count.

template <typename T>
struct my_shared_state
{
  enum {ATOMIC, NORMAL} tag;
  union
  {
    std::atomic<std::size_t> atomic;
    std::size_t normal;
  } refcount;
  T value;
  void incr();
  void decr();
}

template <typename T>
struct my_shared_pointer
{
   my_shared_state<T> * state;
   // ctors and dtors modify refcount held in state
 }

Solution

First, in shared_ptr and other reference counted pointer designs, the reference count is not in the shared_ptr class. It's in the node to which the shared pointer points.

The relationship is thus:

template<typename T>
class Node
{
 std::atomic<unsigned int> RefCount;
 T * obj;
};

template<typename T>
class shared_ptr
{
 Node<T> * node;
 T *       shadow;
};

This is not code for a shared_ptr, but pseudo code for the data structure layout. The shared_ptr points to a Node<T>, which "owns" the object created dynamically, and the reference counter of all the shared_ptr's which have "linked" to it. The shared_ptr<T> has a "shadow" pointer to the Node<T>'s obj pointer.

That's the basic design layout. The atomic increment/decrement is done through a dereference, which is nearly as much overhead as you might think to remove from switching to a non-atomic increment/decrement counter.

I assume the objective is to permit a non-atomic reference counted ownership that may be used in non-threaded work, such that the reference counter need not be synchronized.

Note that the choice is not within shared_ptr, but within Node.

Further, a most important performance enhancement was adopted from boost::shared_ptr in recent C++11/C++14 libraries, embodied in the template function make_shared.

The concept is based on the observation that allocating this structure implies at least two allocations, one for the object being created with new, the other for the Node that will own it (the shared_ptr is presumed to be on the stack, or a member). As such, there was a design implemented in boost (then adopted by C++11), via the make_shared function, which performs one allocation for both the Node and the object being created, which uses "in place construction" and destruction of the object being managed (the T in all of this). This optimization is likely at least as much a performance enhancement, and a memory efficiency optimization on top of that, which you'd loose unless you choose to implement that as well.

Note, too, that garbage collection (of a kind) is now available in compliant C++11/C++14 releases, so one should ask if that's been inspected and rejected as unsuitable before proceeding.

Assuming that is rejected, you're inquiry regarding a union is possible, but another approach offers possibilities and flexibility you may not have yet considered.

Using policy based design, you can create compile time polymorphic options by building template classes out of parameters. The basic idea is to derive from one of the parameters in the template class. For example:

class AtomicRefCount
{ ...
};

class NonAtomicRefCount
{ ...
};

template< typename T, typename Ref >
class Node : public Ref
{ ...
};

typedef Node< SomeStruct, AtomicRefCount >  SomeStruct_Node;

The idea is to be able to select behaviors or construction options based on a parameter to the template. They can be nested for deep chaining of this concept. In this example the idea is to create a Node based on the option of an atomic or non-atomic reference counted integer type.

The challenge, however, is that this means the two kinds of nodes are different types. They are not 'Node< T >' anymore. They are 'Node< T, Ref>' types, and so shared_ptr<T> wouldn't understand them, they'd have to be shared_ptr< T, Ref >

However, using this technique it's possible to design both shared_ptr and weak_ptr from a common base of template interface declarations, which different behaviors based on which parameters are provided in declaring the pointers. In a smart pointer library I wrote years ago, in order to handle some various issues INCLUDING garbage collection as an option, these were possible:

template< typename T > struct MetaPtrTypes
{
  typedef typename SmartPointers::LockPolicy< T, SmartPointers::StrongAttachmentPolicy >    StrongLocking;
  typedef typename SmartPointers::NoLockPolicy< T, SmartPointers::WeakAttachmentPolicy >    WeakNoLocking;
  typedef SmartPointers::LockPolicy< T, SmartPointers::WeakAttachmentPolicy >               WeakLocking;
  typedef SmartPointers::NoLockPolicy< T, SmartPointers::StrongAttachmentPolicy >           StrongNoLocking;
  typedef SmartPointers::PublicAccessPolicy< T, StrongNoLocking >                           PublicStrongNoLock;


  typedef SmartPointers::MetaPtr< T, StrongLocking >          LPtr;
  typedef SmartPointers::MetaPtr< T, WeakNoLocking >          WPtr;
  typedef SmartPointers::MetaPtr< T, WeakLocking >            WLPtr;

  typedef SmartPointers::MetaPtr< T, PublicStrongNoLock >     MPtr;

  typedef T                                                   Type;
};

This was for C++03 or so, as we waited for C++11 features to creep in from the drafts. The idea was to create a MetaPtrTypes< SomeClass > SomeClassPtrs; Doing so gave types like SomeClassPtrs::MPtr, which is a kind of shared_ptr. The WPtr was a weak pointer, both somewhat similar to std::shared_ptr and std::weak_ptr, but with several custom memory allocation options not available to shared_ptr of the period, and with certain locking features which usually required protecting shared_ptr's with mutexes in applications (because writing to shared_ptr is not thread safe).

Notice the strategy of policy based design. MPtr is the equivalent of, say:

typedef std::shared_ptr< SomeClass >  SPtr;

Wherever one could use SPtr, one could use MPtr. But look how MPtr is fashioned. It's a MetaPtr< T, PublicStrongNoLock >. This is a built up policy construction paradigm. MetaPtr is just like the Node< T, Ref > mentioned earlier, but with SEVERAL policies embedded inside it.

Looking over MPtr, WPtr, LPtr, you'll notice there are MetaPtr creations based on various policies, StrongLocking and WeakLocking among them.

They are:

  typedef typename SmartPointers
     ::LockPolicy< T, SmartPointers
          ::StrongAttachmentPolicy >    StrongLocking;

  typedef SmartPointers
     ::LockPolicy< T, SmartPointers
          ::WeakAttachmentPolicy >      WeakLocking;

These are two policies from which smart pointers can be fashioned. Notice that for WLPtr, the policy is WeakLocking, while for LPtr the policy is for StrongLocking.

They are both made from MetaPtr, the primary user interface class. If MetaPtr is fashioned with a Weak kind of policy, it's a weak_ptr. If it's fashioned with a Strong kind of policy, it's a shared_ptr. The difference is known in this library as the attachment policy, and happens to be the root class from which the hierarchy is derived. Between the attachment policy and the MetaPtr is a locking policy. These two are lockers, StrongLocking and WeakLocking. There are non-lockers, StrongNoLock and WeakNoLock.

Several different kinds of smart pointers can be fashioned from a few small template classes implementing no less than 6 different kinds of smart pointers, all based on the same interface and sharing much of the code.

Policy based design is one way to implement what you want, without resorting to the union, though that's not a bad choice. It's simple, but if you more options intended in your design, you should think about policy based design.

More about policy based design of smart pointers is to be found in Alexandrescu's book from 2001 in which he presents loki, a smart pointer based on policies.

In the example presented, the MetaPtr was intended for a number of high performance scenarios where custom memory allocation was required, but shared_ptr of the period did not support it (and wouldn't for many years). Among the options, selectable by policy:

Standard memory allocation
Custom memory allocation
Garbage collection/memory management
Fast locking of writeable pointers
Lightweight reference counted smart pointers
Non-reference counted smart pointers (something like unique_ptr)
Array/Container aware smart pointers
GPU resource managers (loading/unloading textures,models and shader code)

Many of these were selectable in various combinations, all available in weak and standard versions.