Search code examples
cwindowsunixstdin

Different behaviour of Ctrl-D (Unix) and Ctrl-Z (Windows)


As per title I am trying to understand the exact behavior of Ctrl+D / Ctrl+Z in a while loop with a gets (which I am required to use). The code I am testing is the following:

#include <stdio.h>
#include <stdlib.h>

int main()
{

    char str[80];

    while(printf("Insert string: ") && gets(str) != NULL) {

        puts(str);
    }

    return 0;
}

If my input is simply a Ctrl+D (or Ctrl+Z on Windows) gets returns NULL and the program exits correctly. The unclear situation is when I insert something like house^D^D (Unix) or house^Z^Z\n (Windows).

  1. In the first case my interpretation is a getchar (or something similar inside the gets function) waits for read() to get the input, the first Ctrl+D flushes the buffer which is not empty (hence not EOF) then the second time read() is called EOF is triggered.
  2. In the second case though, I noticed that the first Ctrl+Z is inserted into the buffer while everything that follows is simply ignored. Hence my understanding is the first read() call inserted house^Z and discarded everything else returning 5 (number of characters read). (I say 5 because otherwise I think a simple Ctrl+Z should return 1 without triggering EOF). Then the program waits for more input from the user, hence a second read() call.

I'd like to know what I get right and wrong of the way it works and which part of it is simply implementation dependent, if any.


Furthermore I noticed that in both Unix and Windows even after EOF is triggered it seem to reset to false in the following gets() call and I don't understand why this happens and in which line of the code.

I would really appreciate any kind of help.


(12/20/2016) I heavily edited my question in order to avoid confusion


Solution

  • The CTRL-D and CTRL-Z "end of file" indicators serve a similar purpose on Unix and Windows systems respectively, but are implemented quite differently.

    On Unix systems (including Unix clones like Linux) CTRL-D, while officially described as the end-of-file character, is actually a delimiter character. It does almost the same thing as the end-of-line character (usually carriage return or CTRL-M) which is used to delimit lines. Both characters tell the operating system that the input line is finished and to make it available the program. The only difference is that with end-of-line character a line feed (CTRL-J) character is inserted at the end of the input buffer to mark the end of the line, while with the end-of-file character nothing is inserted.

    This means when you enter house^D^D on Unix the read system call will first return a buffer of length 5 with the 5 characters house in it. When read is called again to obtain more input, it will then returns of a buffer of length 0 with no characters in it. Since a zero length read on a normal file indicates that the end of file has been reached the gets library function also interprets this as the end of file and stops reading the input. However since it filled the buffer with 5 characters it doesn't return NULL to indicate that it reached end of the file. And since it hasn't actually actually reached end of file, as terminal devices aren't actually files, further calls to gets after this will make further calls to read which will return any subsequent characters that the user types.

    On Windows CTRL-Z is handled much differently. The biggest difference is that it's not treated specially by the operating system at all. When you type house^Z^Z^M on Windows only the carriage return character is given special treatment. Just like on Unix, the carriage return makes the typed line available to the program, though in this case a carriage return and a line feed are added to the buffer to mark the end of the line. So the result is that ReadFile function returns a 9 byte long buffer with the 9 characters house^Z^Z^M^J in it.

    It actually the program itself, specifically the C runtime library, that treats CTRL-Z specially. In the case of the Microsoft C runtime library when it sees the CTRL-Z character in the buffer returned by ReadFile it treats it as an end-of-file marker and ignores everything else after it. Using the example in the previous paragraph, gets ends up calling ReadFile to get more input because the fact its seen the CTRL-Z character isn't remembered when reading from the console (or other device) and it hasn't yet seen the end-of-line (which was ignored). If you then press enter again, gets will return with the buffer filled with the 7 bytes house^Z\0 (adding a 0 byte to indicate the end of the string). By default, it does the much same thing when reading from normal files, if a CTRL-Z character appears in a file, it and everything after it is ignored. This is for backward-compatibility with CP/M which only supported files in lengths that were multiples of 128 and used CTRL-Z to mark where text files really were supposed to end.

    Note that both the Unix and Windows behaviours described above are only the normal default handling of user input. The Unix handling of CTRL-D only occurs when reading from a terminal device in canonical mode and it's possible to change the "end-of-file" character to something else. On Windows the operating system never treats CTRL-Z specially, but whether the C runtime library does or not depends on whether the FILE stream being read is in text or binary mode. This is why in portable programs you should always include the character b in the mode string when opening binary files (eg. fopen("foo.gif", "rb")).