Search code examples
cpointersprintflanguage-lawyerundefined-behavior

Question about the behavior of uninitialized pointers to integers when used in the printf function


I am new to this particular forum, so if there are any egregious formatting choices, please let me know, and I will promptly update.

In the book C Programming: A Modern Approach (authored by K. N. King), the following passage is written:

If a pointer variable p hasn't been initialized, attempting to use the value of p in any way causes undefined behavior. In the following example, the call of printf may print garbage, cause the program to crash, or have some other effect:

int *p;
printf("%d", *p);

As far as I understand pointers and how the compiler treats them, the declaration int *p effectively says, "Hey, if you dereference p in the future, I will look at a block of four consecutive bytes in memory, whose starting address is the value contained in p, and interpret those 4 bytes as a signed integer."

As to whether or not that is correct...if it is correct, then I am a little confused about why the aforementioned block of code:

  1. is classified as undefined behavior
  2. can cause programs to crash
  3. can have some other effect

Commenting on the above-numbered cases:

My understanding of undefined behavior is that, at run time, anything can happen. With that being said, in the above code it appears to me that only a very defined subset of things can happen. I understand that p (due to its lack of initialization) is storing a random address that could point anywhere in memory. However, when printf is passed the dereferenced value *p, won't the compiler just look at the 4 consecutive bytes of memory (which start at whatever random address) and interpret those 4 bytes as a signed integer?

Therefore, printf should only do one thing: print a number that ranges anywhere from -2,147,483,648 to 2,147,483,647. Clearly that is a lot of different possible outputs, but does that really qualify as "undefined behavior". Further, how could such an "undefined behavior" lead to "program crash" or "have some other effect".

Any clarification would be greatly appreciated! Thanks!


Solution

  • The value of an uninitialized value is indeterminate. It could hold any value (including 0), and it's even possible that a different value could be read each time you attempt to read it. It's also possible that the value could be a trap representation, meaning that attempting to read it will trigger a processor exception that can crash the program.

    Assuming you got lucky and were able to read a value for p, due to the virtual memory model most systems use that value may not correspond to an address that is mapped to the process's memory space. So if you attempt to read from that address by dereferencing the pointer it triggers a segmentation fault that can crash the program.

    Notice that in both of these scenarios the crash occurs before printf is even called.

    Also, compilers are allowed to assume your program does not have undefined behavior and will perform optimizations based on that assumption. That can make your program behave in ways you might not expect.

    As for why doing these things is undefined behavior, it is because the C standard says so. In particular, appendix J2 gives as an example of undefined behavior:

    The value of an object with automatic storage duration is used while it is indeterminate. (6.2.4, 6.7.9, 6.8)