Search code examples
arrayscchar

Why do I need to add NULL termination in my const char arrays?


edit: meant 'null termination' wherever 'padding' is used.

I am using the following function to compare a string (in this case the argv[1]) to an array of accepted arguments.

const char *ARGS_HELP[] = {"-h", "--help", "-help", "help", NULL};
const char *ARGS_BUILD[] = {"-b", "--build", NULL};

int strcmp_array(char *a, const char *b[])
{
        //printf("a = \"%s\"  b[0] = \"%s\" b[1] = \"%s\" b[2] = \"%s\" b[3] = \"%s\" b[4] = \"%s\" b[5] = \"%s\" b[6] = \"%s\ \n",a,b[0],b[1],b[2],b[3],b[4],b[5],b[6]);
        int len = 0;
        while (b[len] != NULL)
        {
                len++;
        }
        for (int i = 0; i < len; i++)
        {
                //printf("Comparing \"%s\" to \"%s\"\n",a,b[i]);
                if (strcmp(a, b[i]) == 0)
                {
                        return 0;
                }
        }
        return 1;
}

int main(int argc, char *argv[])
{
    if (argc == 1 || strcmp_array(argv[1], ARGS_HELP) == 0)
    {
        printf("%s", HELP_PAGE);
        return 0;
    }
}

I noticed that when I ran my program with "--build" as argv[1], it would still trigger the help page. I wondered why since ARGS_HELP and ARGS_BUILD are two separate const char arrays. I noticed strcmp_array was looping through all the possible combinations. This is because I had not inserted NULL at the ends of each array to signify its end. Who can explain why this happens, why doesn't the compiler automatically insert a NULL character at the end of the const char arrays?

With padding

const char *ARGS_HELP[] = {"-h", "--help", "-help", "help", NULL};
const char *ARGS_BUILD[] = {"-b", "--build", NULL};

a = "--build"  b[0] = "-h" b[1] = "--help" b[2] = "-help" b[3] = "help" b[4] = "(null)" b[5] = "(null)" b[6] = "-b"

Without null padding

const char *ARGS_HELP[] = {"-h", "--help", "-help", "help"};
const char *ARGS_BUILD[] = {"-b", "--build"};

a = "--build"  b[0] = "-h" b[1] = "--help" b[2] = "-help" b[3] = "help" b[4] = "-b" b[5] = "--build"

I should always pad my char arrays with NULL from now on?

I am not a computer scientist. I am relatively new to C.


Solution

  • This isn't padding, it's adding a trailing NULL so that iteration of these structures is super simple. You just move along until you hit a NULL, then stop. It's especially common with lists of char* strings, like this, but you will also see it on lists of struct pointers and in other situations.

    This is not unlike now C strings are NUL (character) terminated. In other languages you need not only the character data, but a length field as well, which adds overhead and complexity.

    The alternative is you'd have to know how many there are, then pass that information in as well, which is a hassle, especially if you get that number wrong.

    It's not that you have to, it's that it makes it convenient if the code you're using expects things to work that way. Notice how argc and argv are provided separately, even though argv could have worked the same way. Why the difference? A design decision many decades ago, but likely one that let you quickly test argc to see if you have enough arguments before moving along. There's no strlen() equivalent for arbitrary pointer arrays.