Search code examples
clanguage-lawyerc89flexible-array-member

Multiple structures in a single malloc invoking undefined behaviour?


From Use the correct syntax when declaring a flexible array member it says that when malloc is used for a header and flexible data when data[1] is hacked into the struct,

This example has undefined behavior when accessing any element other than the first element of the data array. (See the C Standard, 6.5.6.) Consequently, the compiler can generate code that does not return the expected value when accessing the second element of data.

I looked up the C Standard 6.5.6, and could not see how this would produce undefined behaviour. I've used a pattern that I'm comfortable with, where the header is implicitly followed by data, using the same sort of malloc,

#include <stdlib.h> /* EXIT malloc free */
#include <stdio.h>  /* printf */
#include <string.h> /* strlen memcpy */

struct Array {
    size_t length;
    char *array;
}; /* +(length + 1) char */

static struct Array *Array(const char *const str) {
    struct Array *a;
    size_t length;
    length = strlen(str);
    if(!(a = malloc(sizeof *a + length + 1))) return 0;
    a->length = length;
    a->array = (char *)(a + 1); /* UB? */
    memcpy(a->array, str, length + 1);
    return a;
}

/* Take a char off the end just so that it's useful. */
static void Array_to_string(const struct Array *const a, char (*const s)[12]) {
    const int n = a->length ? a->length > 9 ? 9 : (int)a->length - 1 : 0;
    sprintf(*s, "<%.*s>", n, a->array);
}

int main(void) {
    struct Array *a = 0, *b = 0;
    int is_done = 0;
    do { /* Try. */
        char s[12], t[12];
        if(!(a = Array("Foo!")) || !(b = Array("To be or not to be."))) break;
        Array_to_string(a, &s);
        Array_to_string(b, &t);
        printf("%s %s\n", s, t);
        is_done = 1;
    } while(0); if(!is_done) {
        perror(":(");
    } {
        free(a);
        free(b);
    }
    return is_done ? EXIT_SUCCESS : EXIT_FAILURE;
}

Prints,

<Foo> <To be or >

The compliant solution uses C99 flexible array members. The page also says,

Failing to use the correct syntax when declaring a flexible array member can result in undefined behavior, although the incorrect syntax will work on most implementations.

Technically, does this C90 code produce undefined behaviour, too? And if not, what is the difference? (Or the Carnegie Mellon Wiki is incorrect?) What is the factor on the implementations this will not work on?


Solution

  • This should be well defined:

    a->array = (char *)(a + 1);
    

    Because you create a pointer to one element past the end of an array of size 1 but do not dereference it. And because a->array now points to bytes that do not yet have an effective type, you can use them safely.

    This only works however because you're using the bytes that follow as an array of char. If you instead tried to create an array of some other type whose size is greater than 1, you could have alignment issues.

    For example, if you compiled a program for ARM with 32 bit pointers and you had this:

    struct Array {
        int size;
        uint64_t *a;
    };
    ...
    Array a = malloc(sizeof *a + (length * sizeof(uint64_t)));
    a->length = length;
    a->a= (uint64_t *)(a + 1);       // misaligned pointer
    a->a[0] = 0x1111222233334444ULL;  // misaligned write
    

    Your program would crash due to a misaligned write. So in general you shouldn't depend on this. Best to stick with a flexible array member which the standard guarantees will work.