Search code examples
arraysctypesbuffer

Is there a difference between using int and size_t for array initialisation in C?


I have written the following brief C program (using the C90 standard) while learning how to use buffered input:

#include <stdio.h>
#include <stdlib.h>

#define MAXLINE 10

int getLine(char s[]) {
    int ch;

    int chCount;
    for (chCount = 0; (ch = getchar()) != '\n' && ch != EOF; ++chCount) {
        if (chCount > MAXLINE) {
            puts("Interrupted read");
            break;
        }

        s[chCount] = ch;
    }

    return chCount;
}

int main(int argc, char *argv[]) {
    char buf[MAXLINE];

    int count;

    while (1) {
        count = getLine(buf);
        printf("%s: %d\n", buf, count);
    }

    return EXIT_SUCCESS;
}

If I execute the above program with the test input "one two three", I get the following output:

> one two three
Interrupted read
one two thr�#��6@&'o�: 11
ene two thr�#��6@&'o�: 1

Most of this output is expected: the input is longer than 11 characters and so the read is split into two operations. No problem. My concern is, of course, �#��6@&'o�. I remember reading that creating an array using the char buf[x] syntax will create an array that is filled with junk data before writing, which is fine. As far as I can remember however, this was a conversation with Microsoft's Copilot, so is this fact actually true? If so, why is this array filled with more junk than its declared size? Is it significant that the length of this output is 21 characters - double the length of the buffer plus one?

What confuses me more - and is the origin of the title of this question - is the following behaviour:

If I modify main() so that the buffer declaration is the following:

int main(int argc, char *argv[]) {
    size_t size = MAXLINE;

    char buf[size];
...

then the output of the program becomes

> one two three
Interrupted read
one two thr: 11
ene two thr: 1

This confounds me entirely. I understand that there is an especial significance to size_t, being a platform implemented type as attested to in this stackoverflow post, but why would what seems to be only a type size difference so drastically change the behaviour of the array?

In summary, my questions are the following:

  • Is it true that initialising the array with a fixed size specifier creates an array that is at first filled with junk data?
  • Why is the array filled with more data than its int declared size would seem to allow?
  • Is it significant that the length of this output is 21 characters - double the length of the buffer plus one?
  • Why does changing nothing other than the type of the array size specifier from int to size_t seem to fix the problem entirely?

Solution

  • Is it true that initialising the array with a fixed size specifier creates an array that is at first filled with junk data?

    That depends on what storage duration that array got. Local variables declared inside functions have automatic storage duration, meaning they are not initialized and contain "junk". This is intentional since for example zero initialization costs extra execution time.

    Why is the array filled with more data than its int declared size would seem to allow?

    It is not, you are merely accessing it out of bounds when you print. Strings must have a null terminator appended to the end or otherwise you start accessing data out of bounds. How should character arrays be used as strings?

    Is it significant that the length of this output is 21 characters - double the length of the buffer plus one?

    The buffer size has nothing to do with it.

    Why does changing nothing other than the type of the array size specifier from int to size_t seem to fix the problem entirely?

    Supposedly because you ruffle around the stack memory layout since size_t takes more memory. There is no predicted behavior when you access an array out of bounds. And so the symptoms caused by doing so aren't particularly meaningful. What is undefined behavior and how does it work?