Search code examples
cstdingets

What does gets() save when it reads just a newline


Here's the description of gets() from Prata's C Primer Plus:

It gets a string from your system's standard input device, normally your keyboard. Because a string has no predetermined length, gets() needs a way to know when to stop. Its method is to read characters until it reaches a newline (\n) character, which you generate by pressing the Enter key. It takes all the characters up to (but not including) the newline, tacks on a null character (\0), and gives the string to the calling program.

It got my curious as to what would happen when gets() reads in just a newline. So I wrote this:

  int main(void)
  {
    char input[100];

    while(gets(input))
    {
      printf("This is the input as a string: %s\n", input);
      printf("Is it the string end character? %d\n", input == '\0');
      printf("Is it a newline string? %d\n", input == "\n");
      printf("Is it the empty string? %d\n", input == "");
    }

    return 0;
  }

Here's my interaction with the program:

$ ./a.out
This is some string
This is the input as a string: This is some string
Is it the string end character? 0
Is it a newline string? 0
Is it the empty string? 0

This is the input as a string:
Is it the string end character? 0
Is it a newline string? 0
Is it the empty string? 0

The second block is really the thing of interest, when all I press is enter. What exactly is input in that case? It doesn't seem to be any of my guesses of: \0 or \n or "".


Solution

  • This part in the description of gets might be confusing:

    It takes all the characters up to (but not including) the newline

    It might be better to say that it takes all the characters including the newline but stores all characters not including the newline.

    So if the user enters some string, the gets function will read some string and the newline character from the user's terminal, but store only some string in the buffer - the newline character is lost. This is good, because no one wants the newline character anyway - it's a control character, not a part of the data that user wanted to enter.

    Therefore, if you only press enter, gets interprets it as an empty string. Now, as noted by some people, your code has multiple bugs.


    printf("This is the input as a string: %s\n", input);

    No problem here, though you might want to delimit your string by some artificial characters for better debugging:

    printf("This is the input as a string: '%s'\n", input);


    printf("Is it the string end character? %d\n", input == '\0');

    Not good: you want to check 1 byte here, not the whole buffer. If you try to compare the whole buffer with 0, the answer is always false because the compiler converts \0 to NULL and interprets the comparison like "does the buffer exist at all?".

    The right way is:

    printf("Does the first byte contain the string end character? %d\n", input[0] == '\0');

    This compares just 1 byte to \0.


    printf("Is it a newline string? %d\n", input == "\n");

    Not good: this compares the address of the buffer with the address of "\n" - the answer is always false. The right way to compare string in C is strcmp:

    printf("Is it a newline string? %d\n", strcmp(input, "\n") == 0);

    Note the peculiar usage: strcmp returns 0 when the strings are equal.


    printf("Is it the empty string? %d\n", input == "");

    The same bug here. Use strcmp here too:

    printf("Is it the empty string? %d\n", strcmp(input, "") == 0);


    BTW as people always say, gets cannot be used in a secure way, because it doesn't support protection from buffer overflow. So you should use fgets instead, even though it's less convenient:

    char input[100];
    while (fgets(input, sizeof input, stdin))
    {
        ...
    }
    

    This leads to possible confusion: fgets doesn't delete the newline byte from the input it reads. So if you replace gets in your code by fgets, you will get different results. Fortunately, your code will illustrate the difference in a clear way.