Search code examples
c++c++11unions

Unrestricted union in practice


I have some questions about unrestricted unions and their application in practice. Let's suppose I have the following code :

struct MyStruct
{
    MyStruct(const std::vector<int>& a) : array(a), type(ARRAY)
    {}
    MyStruct(bool b) : boolean(b), type(BOOL)
    {}
    MyStruct(const MyStruct& ms) : type(ms.type)
    {
        if (type == ARRAY)
            new (&array) std::vector<int>(ms.array);
        else
            boolean = ms.boolean;
    }
    MyStruct& operator=(const MyStruct& ms)
    {
        if (&ms != this) {
            if (type == ARRAY)
                array.~vector<int>(); // EDIT(2) 
            if (ms.type == ARRAY)
                new (&array) std::vector<int>(ms.array);
            else
                boolean = ms.boolean;
            type = ms.type;
        }
        return *this;
    }
    ~MyStruct()
    {
        if (type == ARRAY)
            array.~vector<int>();
    }

    union {
        std::vector<int> array;
        bool             boolean;
    };
    enum {ARRAY, BOOL} type;
};
  1. Is this code valid :) ?
  2. Is it necessary to explicitly call the vector destructor each time we are using the boolean (as stated here http://cpp11standard.blogspot.com/2012/11/c11-standard-explained-1-unrestricted.html)
  3. Why a placement new is required instead of just doing something like 'array = ms.array' ?

EDIT:

  • Yes, it compiles
  • "Members declared inside anonymous unions are actually members of the containing class, and can be initialized in the containing class's constructor." (C++11 anonymous union with non-trivial members)
  • Adding explicit destructors as suggested, leads to SIGSEV with g++ 4.8 / clang 4.2

Solution

    1. The code's buggy: change array.clear(); to array.~vector<int>();

    Explanation: operator= is using placement new over an object that hasn't been destructed, which could do anything but practically you can expect it to leak the dynamic memory the previous array had been using (clear() doesn't release memory / change capacity, it just destructs elements and changes size).

    From 9.5/2:

    If any non-static data member of a union has a non-trivial default constructor (12.1), copy constructor (12.8), move constructor (12.8), copy assignment operator (12.8), move assignment operator (12.8), or destructor (12.4), the corresponding member function of the union must be user-provided or it will be implicitly deleted (8.4.3) for the union.

    So, the vector constructor, destructor etc never kicks in by themselves: you must call them explicitly when wanted.

    In 9.5/3 there's an example:

    Consider the following union:

    union U {
        int i;
        float f;
        std::string s;
    };
    

    Since std::string (21.3) declares non-trivial versions of all of the special member functions, U will have an implicitly deleted default constructor, copy/move constructor, copy/move assignment operator, and destructor. To use U, some or all of these member functions must be user-provided.

    That last bit - "To use U, some or all of these member functions must be user-provided." - seems to presume that U needs to coordinate its own vaguely value-semantic behaviour, but in your case the surrouding struct is doing that so you don't need to define any of these union member functions.

    2: we must call the array destructor whenever an array value is being replaced by a boolean value. If in operator= a new array value is being placement-newed instead of assigned, then the old array must also have its destructor called, but using operator= would be more efficient when the existing memory is sufficient for all the elements being copied. Basically, you must match constructions and destructions. UPDATE: the example code has a bug as per your comment below.

    3: Why a placement new is required instead of just doing something like 'array = ms.array' ?

    array = ms.array invokes std::vector<int>::operator= which always assumes the this pointer addresses an already properly constructed object. Inside that object you can expect there to be a pointer which will either be NULL or refer to some internal short-string buffer, or refer to heap. If your object hasn't been destructed, then operator= may well call a memory deallocation function on the bogus pointer. Placement new says "ignore the current content of the memory that this object will occupy, and construct a new object with valid members from scratch.