Search code examples
cstructstandards

How does "struct inheritance" not violate the strict aliasing rule?


The "struct inheritance technique" in C (as described in this question) is made possible by the fact that the C standard guarantees that the first member of a struct will never have any padding before it (?), and that the address of the first member will always be equal to the address of the struct itself.

This allows usage such as the following:

typedef struct {
    // some fields
} A;

typedef struct {
    A base;
    // more fields
} B;

typedef struct {
    B base;
    // yet more fields
} C;

C* c = malloc(sizeof(C));
// ... init c or whatever ...
A* a = (A*) c;
// ... access stuff on a etc.
B* b = (B*) c;
// ... access stuff on b etc.

This question has two parts:

A. It seems to me this technique breaks the strict aliasing rule. Am I wrong, and if so, why?

B. Suppose that this technique is indeed legal. In that case, does it make a difference if A: we first store the object in an lvalue of its specific type, before down or up casting it to a different type, or B: if we cast it directly to the particular type desired at the moment, without first storing it in the lvalue of the specific type?

For example, are these three options all equally legal?

Option 1:

C* make_c(void) {
    return malloc(sizeof(C));
}    

int main(void) {
    C* c = make_c(); // First store in a lvalue of the specific type
    A* a = (A*) c;
    // ... do stuff with a
    C* c2 = (C*) a; // Cast back to C
    // ... do stuff with c2

    return 0;
}

Option 2:

C* make_c(void) {
    return malloc(sizeof(C));
}    

int main(void) {
    A* a = (A*) make_c(); // Don't store in an lvalue of the specific type, cast right away
    // ... do stuff with a
    C* c2 = (C*) a; // Cast back to C
    // ... do stuff with c2

    return 0;
}

Option 3:

int main(void) {
    A* a = (A*) malloc(sizeof(C)); // Don't store in an lvalue of the specific type, cast right away
    // ... do stuff with a
    C* c2 = (C*) a; // Cast to C - even though the object was never actually stored in a C* lvalue
    // ... do stuff with c2

    return 0;
}

Solution

  • A. It seems to me this technique breaks the strict aliasing rule. Am I wrong, and if so, why?

    Yes, you are wrong. I'll consider two cases:

    Case 1: The C is fully initialized

    That would be this, for example:

    C *c = malloc(sizeof(*c));
    *c = (C){0};  // or equivalently, "*c = (C){{{0}}}" to satisfy overzealous compilers
    

    In that case, all the bytes of the representation of a C are set, and the effective type of the object comprising those bytes is C. This comes from paragraph 6.5/6 of the standard:

    If a value is stored into an object having no declared type through an lvalue having a type that is not a character type, then the type of the lvalue becomes the effective type of the object for that access and for subsequent accesses that do not modify the stored value.

    But structure and array types are aggregate types, which means that objects of such types contain other objects within them. In particular, each C contains a B identified as its member base. Because the allocated object is, at this point, effectively a C, it contains a sub-object that is effectively a B. One syntax for an lvalue referring to that B is c->base. The type of that expression is B, so it is consistent with the strict-aliasing rule to use it to access the B to which it refers. That has to be ok, else structures (and arrays) would not work at all, whether dynamically allocated or not.*

    But, as discussed in my answer to your previous question, (B *)c is guaranteed to be equal (in value and type) to &c->base. Thus *(B *)c is another lvalue referring to the B that is the first member of *c. That the syntax of that expression is different from that of the previous lvalue we considered is of no account. It is an lvalue of type B, associated with an object of type B, so using it to access the object to which it refers is one of the cases allowed by the SAR.

    None of this is any different from the statically and automatically allocated cases.

    Case 2: The C is not fully initialized

    That could be something like this:

    C *c = malloc(sizeof(*c));
    *(B *)c = (B){0};
    

    We have thereby assigned to the initial B-sized portion of the allocated object via an lvalue of type B, so the effective type of that initial portion is B. The allocated space does not at this point contain an object of (effective) type C. We can access the B and its members, read or write, via any acceptably-typed lvalues referring to them, as discussed above. But we have a strict aliasing violation if we

    • attempt to read *c as a whole (e.g. C c2 = *c;);
    • attempt to read C members other than base (e.g. X x = c->another;); or
    • attempt to read the allocated object via an lvalue of most unrelated types (e.g. Unrelated_but_not_char u = *(Unrelated_but_not_char *) c;

    The first two of those cases are of interest here, and they make sense in terms of the dynamically allocated object, when interpreted as a C, not being fully initialized. Similar incomplete-initialization cases can arise with automatically allocated objects, too; they also produce undefined behavior, but by different rules.

    Note well, however, that there is no strict aliasing violation for any write to the allocated space, because any such write will (re)assign the effective type of (at least) the region that is written to.

    And that brings us to the main tricksome bit. What if we do this:

    C *c = malloc(sizeof(*c));
    c->base = (B){0};
    

    ? Or this:

    C *c = malloc(sizeof(*c));
    c->another = 0;
    

    The allocated object does not have any effective type before the first write to it (and in particular, it does not have effective type C), so do write-to-member expressions via *c even make sense? Are they well-defined? The letter of the standard might support an argument that they do not, but no implementation adopts such interpretation, and there is no reason to think that any ever would.

    The interpretation most consistent with both the letter of the standard and universal practice is that writing through a member-access lvalue constitutes simultaneously writing to the member and to its host aggregate, thus setting the effective type of the whole region, even though only one member's value is written. Of course, that still does not make it ok to read members whose values have not been written -- because their values are indeterminate, not because of the SAR.

    That leaves this case:

    C *c = malloc(sizeof(*c));
    *(B *)c = (B){0};
    B b2 = c->base;            // What about this?
    

    That is, if the effective type of an initial region of the allocated space is B, can we use a member-access lvalue based on type C to read the stored value of that B region? Again, one might argue not, on the basis that there is no actual C, but in practice, no implementation makes that interpretation. The effective type of the object being read -- the initial region of the allocated space -- is the same as the type of the lvalue used for access, so in that sense there is no SAR violation. That the host C is wholly hypothetical is a question primarily of syntax, not semantics, because the same region can definitely be read as an object of the same type via an alternative expression.


    * But the SAR nevertheless forestalls any debate on this point by providing that "an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union)" is among the types that may be accessed. This clears any ambiguity surrounding the position that accessing a member also constitutes accessing any objects containing it.