Search code examples
cundefined-behavior

Does casting an array to a char* imply a limit on the string's length?


What should this code print?

#include <stdio.h>
#include <string.h>

struct S
{
    int x[1];
};

union U
{
    struct S arr[64];
    char s[256];
};

int main()
{
    union U u;
    strcpy(u.s, "abcdefghijklmnopqrstuvwxyz");
    size_t len = strlen((char*)&u.arr[1].x);
    puts(len > 10 ? "YES" : "NO");
    return 0;
}

Clang always prints "YES". GCC 8.1 prints "NO" with optimizations, though emits no warnings. Is it taking advantage of some undefined behavior?


Solution

  • Implementations that are suitable for systems programming will allow a pointer to an inner object to be used to derive pointers to containing objects. The C Standard does not, however, seek to require that all conforming implementations be suitable any purpose whatsoever (the authors acknowledge in the rationale that it would be possible to construct a conforming implementation which is of such low quality as to be essentially useless), much less that they all be suitable for systems programming. On the other hand, it does describe a fairly easy means by which an implementation intended for systems programming can provide the necessary semantics.

    In particular, while the Standard does not mandate that a direct cast from T* to V* will behave as a conversion from T* to U*, followed by a conversion from U* to V* if there exists some type U* supporting round-trip conversions to/from T* and V*, such behavior was certainly commonplace when it was written. Many actions whose behavior would otherwise not be defined by the Standard would be defined on an implementation that guarantees that pointer casts behave transitively.

    Among other things, the Standard specifies that a pointer to an aggregate (array, struct, or union), suitably converted, will yield a pointer to its first element/member and vice versa. Thus, converting &u.x[0] to an int(*)[1], converting that to a struct S*, then to a union U*, and then finally to a char*, would yield a char* which can be used to index the entire structure. While Standard may allow a conforming implementation to treat a cast to to char* in a way that only allows access to the specific "inner" object whose address was converted, it hardly implies that implementations should do so, nor that such a restriction would not make an implementation unsuitable for systems programming.

    PS--I could certainly see benefits to a range-limiting qualifier that would indicate that a pointer to a particular object will not be used to derive the address of anything outside that object. Given something like:

    struct foo {int x,y,z; };
    ...
    int test(struct foo restrict *it)
    {
      it->y++;
      doSomething(&it->x);
      it->y--;
      return it->y;
    }
    

    the existence of such a qualifier on the parameter to doSomething() would allow a compiler to optimize out the operations on it->y whether or not it knew anything about the code for doSomething(). Note, however, that to be most useful such a qualifier would require that--as with restrict--operations that would normally launder the pointer would not erase its effects. Consequently, it makes more sense to treat unqualified casts as laundering pointers to the extent possible than to treat casts as yielding range-limited pointers except when explicitly laundered.