Search code examples
cstrict-aliasing

Why is the reference of structs with similar first members allowed despite C strict aliasing rules?


First, I apologize if this appears to be a duplicate, but I couldn't find exactly this question elsewhere

I was reading through N1570, specifically §6.5¶7, which reads:

An object shall have its stored value accessed only by an lvalue expression that has one of the following types:
— a type compatible with the effective type of the object,
— a qualified version of a type compatible with the effective type of the object,
— a type that is the signed or unsigned type corresponding to the effective type of the object,
— a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
— an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
— a character type.

This reminded me of a common idiom I had seen in (BSD-like) socket programming, especially in the connect() call. Though the second argument to connect() is a struct sockaddr *, I have often seen passed to it a struct sockaddr_in *, which appears to work because they share a similar initial element. My question is:

To which contingency detailed in the above rule does this situation apply and why, or is it now undefined behavior that's an artifact of previous standard(s)?


Solution

  • The rules about common initial sequences goes back to 1974. The earliest rules about "strict aliasing" only go back to 1989. The intention of the latter was not that they trump everything else, but merely that compilers be allowed to perform optimizations that their customers would find useful without being branded non-conforming. The Standard makes clear that in situations where one part of the Standard and/or an implementation's documentation would describe the behavior of some action but another part of the Standard would characterize it as Undefined Behavior, implementations may opt to give priority to the first, and the Rationale makes clear that the authors thought "the marketplace" would be better placed than the Committee to determine when implementations should do so.

    Under a sufficiently pedantic reading of the N1570 6.5p7 constraints, almost all programs violate them, but in ways that won't matter unless an implementation is being sufficiently obtuse. The Standard makes no attempt to list all the situations in which an object of one type may be accessed by an lvalue of another, but rather those where a compiler must allow for an object of one type to be accessed by a seemingly unrelated lvalue of another. Given the code sequence:

    int x;
    int *p[10];
    p[2] = &someStruct.intMember;
    ...
    *p[2] = 23;
    x = someStruct.intMember;
    

    In the absence of the rules in 6.5p7, unless a compiler kept track of where p[2] came from, it would have no reason to recognize that the read of someStruct.member might be targeting storage that was just written using *p[2]. On the other hand, given the code:

    int x;
    int *p[10];
    ...
    someStruct.intMember = 12;
    p[2] = &someStruct.intMember;
    x = *p[2];
    

    Here, there is no rule that would actually allow the storage associated with a structure to be accessed by an lvalue of that member type, but unless a compiler is being deliberately blind, it would be able to see that after the first assignment to someStruct.intMember, the address of that member is being taken, and should either:

    1. Account for all actions that will ever be done with the resulting pointer, if it is able to do so, or
    2. Refrain from assuming that the structure's storage will not be accessed between the previous and succeeding actions using the structure's type.

    I don't think it ever occurred to the people who were writing the rules that would later be renumbered as N1570 6.5p7 that they would be construed so as to disallow common patterns that exploited the Common Initial Sequence rule. As noted, most programs violate the constraints of 6.5p7, but do so in ways that would be processed predictably by any compiler that isn't being obtuse; those using the Common Initial Sequence guarantees would have fallen into that category. Since the authors of the Standard recognized the possibility of a "conforming" compiler that was only capable of meaningfully processing one contrived and useless program, the fact that an obtuse compiler could abuse the "aliasing rules" wasn't seen as a defect.