Search code examples
c++castinglanguage-lawyerunionundefined-behavior

Why is casting a union member pointer to a union pointer not UB as they can have different size?


In the following code snippet, a buffer-overflow occurs, namely when doing u->b = *b;.

class A {
public:
  int x;
  A() {
    x = 5;
  }
};

class B {
public:
  int x;
  float y;
  B() {
    x = 10;
    y = 3.14;
  }
};

union U {
  A a;
  B b;
};

U* foo(U* u) {
  B* b = new B();
  u->b = *b;
  return u;
}

int main() {
  A* a  = new A();
  U* u = (U*) a;
  u = foo(u);
  return u->b.y;
}

From what I read here, the cast (U*) a should be well defined. To my understanding it should not be as sizeof(U) != sizeof(A).

It's seems that U and A are pointer-interconvertible but I'm unsure if this implies that the cast is well defined.

I'm having trouble following the standard in this case, any help would be welcome!

*** EDIT *** I do not plan to use this code but seen people activate a union in this way. For example here


Solution

  • Objects of type U are pointer-interconvertible with their subobject U::a. This can be tested with std::is_pointer_interconvertible_with_class(&U::a).

    (U*) a is reinterpret_cast<U*>(a). This is static_cast<U*>(static_cast<void*>(a)), where [expr.static.cast]p14 says:

    Otherwise, if the original pointer value points to an object a, and there is an object b of type similar to T that is pointer-interconvertible with a, the result is a pointer to b.

    There might not be an object of type U at that address, meaning you don't get a pointer to a U, just a pointer with type U pointing to *a.

    Subsequently accessing it like a U is undefined behaviour (for obvious reasons, given in [basic.lval]p11). So u->b = *b; is undefined behaviour.


    However, it is possible that this could avoid UB. The operator new call can return more bytes than requested ([basic.stc.dynamic.allocation]p2, [new.delete.single]p3), and operator new implicitly creates object ([intro.object]p14), and std::is_implicit_lifetime_v<U>. That operator new call could allocate enough bytes to store a U object, in which case it would implicitly create the U, making this well formed.

    A more standard way to do that would be this:

    A* a = static_cast<A*>(operator new(sizeof(U)));
    U* u = reinterpret_cast<U*>(a);
    u = foo(u);
    return u->b.y;
    

    Which explicitly avoids that size problem you were worried about.