Search code examples
cstring-literalsarray-initialization

Are char arrays guaranteed to be null terminated?


#include <stdio.h>

int main() {
    char a = 5;
    char b[2] = "hi"; // No explicit room for `\0`.
    char c = 6;

    return 0;
}

Whenever we write a string, enclosed in double quotes, C automatically creates an array of characters for us, containing that string, terminated by the \0 character http://www.eskimo.com/~scs/cclass/notes/sx8.html

In the above example b only has room for 2 characters so the null terminating char doesn't have a spot to be placed at and yet the compiler is reorganizing the memory store instructions so that a and c are stored before b in memory to make room for a \0 at the end of the array.

Is this expected or am I hitting undefined behavior?


Solution

  • It is allowed to initialize a char array with a string if the array is at least large enough to hold all of the characters in the string besides the null terminator.

    This is detailed in section 6.7.9p14 of the C standard:

    An array of character type may be initialized by a character string literal or UTF−8 string literal, optionally enclosed in braces. Successive bytes of the string literal (including the terminating null character if there is room or if the array is of unknown size) initialize the elements of the array.

    However, this also means that you can't treat the array as a string since it's not null terminated. So as written, since you're not performing any string operations on b, your code is fine.

    What you can't do is initialize with a string that's too long, i.e.:

    char b[2] = "hello";
    

    As this gives more initializers than can fit in the array and is a constraint violation. Section 6.7.9p2 states this as follows:

    No initializer shall attempt to provide a value for an object not contained within the entity being initialized.

    If you were to declare and initialize the array like this:

    char b[] = "hi"; 
    

    Then b would be an array of size 3, which is large enough to hold the two characters in the string constant plus the terminating null byte, making b a string.

    To summarize:

    If the array has a fixed size:

    • If the string constant used to initialize it is shorter than the array, the array will contain the characters in the string with successive elements set to 0, so the array will contain a string.
    • If the array is exactly large enough to contain the elements of the string but not the null terminator, the array will contain the characters in the string without the null terminator, meaning the array is not a string.
    • If the string constant (not counting the null terminator) is longer than the array, this is a constraint violation which triggers undefined behavior

    If the array does not have an explicit size, the array will be sized to hold the string constant plus the terminating null byte.