Search code examples
c++language-lawyerundefined-behavior

Is the calculation of byte pointer offsets between non-array composition members defined?


I understand the reasoning behind expr.add#4.2, restricting + and - to pointers targeting elements of the same array, which was explained in several comments to my question regarding an offset pointer class. However, the very same could be accomplished by both pointer and pointer plus offset targets being within the same (nested) composition, not necessarily an array. If these differences could be anything the offsetof macro would be pointless.

Example demonstrating the offsetof analogy:

#include <iostream>
#include <cstddef>

struct Inner
{
    int i1;
    float f;
    bool b;
    int i2;
};

struct Outer
{
    int i1;
    char c;
    int i2;
    Inner inner;
    double d;
};

int main(int argc, char* argv[])
{
    Outer outer;
    std::cout << "reinterpret_cast<std::byte*>(&outer.i2) - reinterpret_cast<std::byte*>(&outer.i1): "
              << reinterpret_cast<std::byte*>(&outer.i2) - reinterpret_cast<std::byte*>(&outer.i1) << std::endl;
    std::cout << "reinterpret_cast<std::byte*>(&outer.inner.i2) - reinterpret_cast<std::byte*>(&outer.i1): "
              << reinterpret_cast<std::byte*>(&outer.inner.i2) - reinterpret_cast<std::byte*>(&outer.i1) << std::endl;
    std::cout << "offsetof(Outer, i2) - offsetof(Outer, i1): " << offsetof(Outer, i2) - offsetof(Outer, i1) << std::endl;
    std::cout << "offsetof(Outer, inner) + offsetof(Inner, i2) - offsetof(Outer, i1): "
              << offsetof(Outer, inner) + offsetof(Inner, i2) - offsetof(Outer, i1) << std::endl;

    return 0;
}

returns

reinterpret_cast<std::byte*>(&outer.i2) - reinterpret_cast<std::byte*>(&outer.i1): 8
reinterpret_cast<std::byte*>(&outer.inner.i2) - reinterpret_cast<std::byte*>(&outer.i1): 24
offsetof(Outer, i2) - offsetof(Outer, i1): 8
offsetof(Outer, inner) + offsetof(Inner, i2) - offsetof(Outer, i1): 24

All involved pointer targets are within the object outer, but are not the same object. Undefined behaviour?

(Edited after comments, than non-byte pointers would break with mixed type compositions, and that "hypothetical" in expr.add#4.2 does not meet my needs here. Thanks!)

An argument in favor of defined behaviour could be:

  • Conversion of pointers to byte, char and unsigned char types are defined under the restrictions of strict aliasing.
  • The byte pointer to the outer composition itself is pointing to the first element of the byte array A, representing the entire composition.
  • Semantics of offsetof requires, that all composition members have a fixed offset inside the composition.
  • Therefore all byte-converted pointers of composition members are also targeting elements of A.

Solution

  • Yes, this code is undefined behavior. Let's just look at:

    reinterpret_cast<std::byte*>(&outer.i2) - reinterpret_cast<std::byte*>(&outer.i1)
    

    i1 is the first member of outer and therefore pointer-interconvertible with it. This means that via reinterpret_cast, you could get a pointer to the outer and also a pointer to the byte array which represents the outer object.

    However, i2 is not the first member, and access to that byte array cannot be gained through it. You can also say that all bytes of storage in outer are reachable through i1, but not through i2. Therefore, according to [expr.add] p4, this pointer subtraction is UB. The "possibly hypothetical" part also isn't relevant because it's only meant for pointer arithmetic with the (hypothetical) one-past-the-end element of any array.

    Note that offsetoff could be implemented using similar arithmetic on std::byte*, but it's special and guaranteed to work, unlike any attempts by the user to do the same.