Search code examples
clanguage-lawyerundefined-behaviorflexible-array-member

Flexible array members can lead to undefined behavior?


  1. By using flexible array members (FAMs) within structure types, are we exposing our programs to the possibility of undefined behavior?

  2. Is it possible for a program to use FAMs and still be a strictly conforming program?

  3. Is the offset of the flexible array member required to be at the end of the struct?

The questions apply to both C99 (TC3) and C11 (TC1).

#include <stdio.h>
#include <stdlib.h>
#include <stddef.h>

int main(void) {
    struct s {
        size_t len;
        char pad;
        int array[];
    };

    struct s *s = malloc(sizeof *s + sizeof *s->array);

    printf("sizeof *s: %zu\n", sizeof *s);
    printf("offsetof(struct s, array): %zu\n", offsetof(struct s, array));

    s->array[0] = 0;
    s->len = 1;

    printf("%d\n", s->array[0]);

    free(s);
    return 0;
}

Output:

sizeof *s: 16
offsetof(struct s, array): 12
0

Solution

  • The Short Answer

    1. Yes. Common conventions of using FAMs expose our programs to the possibility of undefined behavior. Having said that, I'm unaware of any existing conforming implementation that would misbehave.

    2. Possible, but unlikely. Even if we don't actually reach undefined behavior, we are still likely to fail strict conformance.

    3. No. The offset of the FAM is not required to be at the end of the struct, it may overlay any trailing padding bytes.

    The answers apply to both C99 (TC3) and C11 (TC1).


    The Long Answer

    FAMs were first introduced in C99 (TC0) (Dec 1999), and their original specification required the offset of the FAM to be at the end of the struct. The original specification was well-defined and as such couldn't lead to undefined behavior or be an issue with regards to strict conformance.

    C99 (TC0) §6.7.2.1 p16 (Dec 1999)

    [This document is the official standard, it is copyrighted and not freely available]

    The problem was that common C99 implementations, such as GCC, didn't follow the requirement of the standard, and allowed the FAM to overlay any trailing padding bytes. Their approach was considered to be more efficient, and since for them to follow the requirement of the standard- would result with breaking backwards compatibility, the committee chose to change the specification, and as of C99 TC2 (Nov 2004) the standard no longer required the offset of the FAM to be at the end of the struct.

    C99 (TC2) §6.7.2.1 p16 (Nov 2004)

    [...] the size of the structure is as if the flexible array member were omitted except that it may have more trailing padding than the omission would imply.

    The new specification removed the statement that required the offset of the FAM to be at the end of the struct, and it introduced a very unfortunate consequence, because the standard gives the implementation the liberty not to keep the values of any padding bytes within structures or unions in a consistent state. More specifically:

    C99 (TC3) §6.2.6.1 p6

    When a value is stored in an object of structure or union type, including in a member object, the bytes of the object representation that correspond to any padding bytes take unspecified values.

    This means that if any of our FAM elements correspond to (or overlay) any trailing padding bytes, upon storing to a member of the struct- they (may) take unspecified values. We don't even need to ponder whether this applies to a value stored to the FAM itself, even the strict interpretation that this only applies to members other than the FAM, is damaging enough.

    #include <stdio.h>
    #include <stdlib.h>
    #include <stddef.h>
    
    int main(void) {
        struct s {
            size_t len;
            char pad;
            int array[];
        };
    
        struct s *s = malloc(sizeof *s + sizeof *s->array);
    
        if (sizeof *s > offsetof(struct s, array)) {
            s->array[0] = 123;
            s->len = 1; /* any padding bytes take unspecified values */
    
            printf("%d\n", s->array[0]); /* indeterminate value */
        }
    
        free(s);
        return 0;
    }
    

    Once we store to a member of the struct, the padding bytes take unspecified bytes, and therefore any assumption made about the values of the FAM elements that correspond to any trailing padding bytes, is now false. Which means that any assumption leads to us failing strict conformance.

    Undefined behavior

    Although the values of the padding bytes are "unspecified values", the same can't be said about the type being affected by them, because an object representation which is based on unspecified values can generate a trap representation. So the only standard term which describes these two possibilities would be "indeterminate value". If the type of the FAM happens to have trap representations, then accessing it is not just a concern of an unspecified value, but undefined behavior.

    But wait, there's more. If we agree that the only standard term to describe such value is as being an "indeterminate value", then even if the type of the FAM happens not to have trap representations, we've reached undefined behavior, since the official interpretation of the C standards committee is that passing indeterminate values to standard library functions is undefined behavior.