Search code examples
cscanfkernighan-and-ritchieansi-c

Problem with K&R C book regarding how scanf deals with blanks and tabs in the format string?


Reading the famous book The C programming language ANSI C second edition by Brian Kernighan and Dennis Ritchie, I found in chapter 7 (section 7.4. page 157) this paragraph below which describe the format string for scanf:

[...]

The format string usually contains conversion specifications, which are used to control conversion of input. The format string may contain:

  • Blanks or tabs, which are ignored.

[...]

And I remembered that nowadays we use space in format string to tell the compiler to skip white space until it finds a non-white space character. So I assumed that this paragraph is no longer valid due to the updating of the C the language through the years. Is what I am saying is correct or not? 🤔


Solution

  • The C bible documents an obsolete version of scanf(). Early versions of scanf() used to ignore all white space in the input string, so white space in the format string were ignored too. This behavior was changed well before C was normalized by ANSI and later by ISO.

    The book cover of the second edition does mention ANSI-C, but regarding scanf(), its description is incorrect for the ANSI and later versions.

    As a matter of fact the man page from Version 7, the original Unix from Bell Labs in 1979 already documents this:

    The control string usually contains conversion specifications, which are used to direct interpretation of input sequences. The control string may contain:

    1. Blanks, tabs or newlines, which match optional white space in the input.
    2. An ordinary character (not %) which must match the next character of the input stream.
    3. Conversion specifications, consisting of the character %, an optional assignment suppressing character *, an optional numerical maximum field width, and a conversion character.

    No actual compilers support the ancient behavior documented in the book. After researching this surprising mistake in K&R, it seems scanf() has had the current behavior almost from day one of the Unix system. scanf() has always been quirky and error prone, this great finding adds to a long series of traps and pitfalls.

    You can find a list of errata correcting some errors in the second edition of the book, but this particular one is not listed.

    For further investigations, a lot of historic information can be found on Dennis Ritchie's home page, Brian Kernighan's page on the book, and here, and on bitsavers.org archives.