Search code examples
cmallocundefined-behavior

Does reading an uninitialized malloc() memory invoke Undefined Behaviors?


I know this is a really basic question, and there may be a duplicate, but I couldn't find a strict answer to this specific question which refers to the Standard. (I saw some say it's UB, others say not)

If I allocate a block of memory without filling data into it,

int* ptr = malloc(10 * sizeof(int));

and then try to read it, the values there will be garbage.

But is this classified as an Undefined Behavior? Or is it just bad but at least not a UB?


Solution

  • Summary

    The behavior of reading uninitialized memory provided by malloc is not undefined per se. It can result in undefined behavior if memory containing a trap representation is read with a non-character type, but this can occur only if the type has a trap representation. (Most modern C implementations do not have trap representations for integer types.)

    However, while it is not fully undefined, neither is it fully defined. Attempting to read uninitialized memory is not required to actually read the memory.

    Details

    C 2018 7.22.3.4 2 says, of the malloc function with parameter size:

    The malloc function allocates space for an object whose size is specified by size and whose value is indeterminate.

    C 3.19.2 1 defines indeterminate value as:

    either an unspecified value or a trap representation

    C 3.19.3 1 defines unspecified value as:

    valid value of the relevant type where this document imposes no requirements on which value is chosen in any instance

    Nothing in this makes the behavior undefined.

    The behavior of reading a trap representation with a non-character type is not defined by the C standard, per 6.2.6.1 5. So, if the memory is read with a type that has a trap representation, and the resulting bits happen to contain values that represent a trap, then the behavior is undefined.

    Trap representations in integer types are rare in modern C implementations. Many years ago, some systems would reserve certain bit patterns, such as the 16-bit 800016, to represent uninitialized or invalid data, and attempting to use such a value in arithmetic would generate a trap. In a C implementation without trap representations in some type T, accessing uninitialized data through type T cannot encounter a trap representation. So the result must be an unspecified (and hence valid) value of the type.

    Further, there is nothing else in the C standard that would make this behavior undefined. There is a rule in 6.3.2.1 2 that accessing an uninitialized object of automatic storage duration has undefined behavior if its address is not taken. However, the memory provided by malloc has allocated storage duration, not automatic. (That rule is an accommodation to certain Hewlett-Packard hardware with the capability of marking a register as uninitialized and trapping when it is used.)

    Also, whole structures and unions are never trap representations, regardless of the types of their members. The most common trap representation in modern C implementations is a floating-point signaling NaN (Not a Number).

    Note that the value in the allocated memory is unspecified, and the definition above states “this document imposes no requirements on which value is chosen in any instance.” That means if you do this:

    unsigned *p = malloc(sizeof *p);
    printf("%u\n", *p);
    printf("%u\n", *p);
    

    the C standard imposes no requirement on which value is chosen for *p in the first printf and no requirement on which value is chosen in the second printf, not even a requirement that they be the same as each other. An “unspecified value” may act like it has bits that are changing by themselves from moment to moment. So, the behavior is not undefined—it cannot allow “anything” to happen to your program; your program cannot suddenly jump to a different function or wipe out other data—but neither is it defined to act like the memory has bits with fixed values.

    This means you cannot reliably read the uninitialized memory—reads of the memory are not guaranteed to produce the bits that are actually in physical memory.

    Discussion

    To see why the C standard allows the program to act like the bits in memory may be changing, consider this code:

    unsigned a = *p + 3;
    unsigned b = *p + 4;
    

    For that code in normal situations, the compiler might generate assembly like this:

    // As we start, registers r7, r8, and r9 already contain p,
    // the address of a, and the address of b, respectively.
    load  r3, (r7) // Get value of *p from memory.
    add   r3, #3   // Add 3.
    store r3, (r8) // Store sum to a.
    load  r3, (r7) // Get value of *p from memory.
    add   r3, #4   // Add 4.
    store r3, (r9) // Store sum to b.
    

    If the memory p points to happened to contain 0, then these instructions would store 3 in a and 4 in b. However, the rule that uninitialized memory is not required to behave as if it had a fixed value means the compiler’s optimizer is allowed to eliminate the load instructions. Hypothetically, that could result in instructions such as:

    add   r3, #3   // Add 3.
    store r3, (r8) // Store sum to a.
    add   r3, #4   // Add 4.
    store r3, (r9) // Store sum to b.
    

    If r3 happens to contain 0 when this code sequence starts, then 3 will be stored in a, and 7 will be stored in b. There is no possible value *p could have that would result in *p + 3 being 3 and *p + 4 being 7. So this code acts as if *p has changed by itself.

    In practice, optimization would not just remove the load instructions here and not also recognize the subsequent instructions are also disconnected from fixed values and remove them. However, real-world optimizations get more complex than this. The license granted by the C standard allows the compiler to remove the parts of the code that it can figure out are not using defined values, even if it cannot figure out everything about the program.