Search code examples
cstandardsdeprecatedc11gets

Why was gets part of the C standard in the first place?


Every C programmer knows there is no way to securely use gets unless standard input is connected to a trusted source. But why didn't the developers of C notice such a glaring mistake before it was made an official part of the C standard? And why did it take until C11 to remove it from the standard and replace it with a function that performs bounds-checking? I'm aware fgets is typically used in its place, but that has the annoying habit of keeping the \n at the end.


Solution

  • The answer is simply that C is a very old language, dating back to the early 1970s. The sort of security threats we take for granted today weren't on the horizon when the language was first developed.

    For a long time, C was the in-house language at AT&T. It was difficult to find commercial compilers for C until the late 1970s. But when the UNIX operating system was rewritten in C, compilers became more readily available, and the language took off, especially after Kernighan and Ritchie's 1978 standard reference, The C Programming Language.

    Despite its widespread and growing popularity, the language itself wasn't standardized until 1989. By that point, C was nearly 20 years old and there was a lot of installed C code. The standards committee was relatively conservative; it worked on the assumption that the standard would codify existing practices rather than require new ways of doing things. The buffer overflow vulnerability of gets() seemed trivial compared to the cost of declaring a large portion of the installed code base nonstandard.

    The Morris internet worm of 1988 did make clear the need for more secure coding practises, but even so, back in the late 1980s the internet was still extremely nascent. (If I remember correctly, an early 1990s Macintosh book by David Pogue answered the question of how to connect a Mac to the Internet with something to the effect of "Don't bother, the Internet isn't worth the effort".) One can hardly fault the standards committee for misjudging the exponential growth of the Internet and attended security threats.

    When the standard was revised in 1999, matters had changed, of course. However, the committee again chose to be cautious about invalidating existing code, and so to deprecate rather than remove gets() altogether. It's debatable whether this was the right decision, but it wasn't obviously the wrong one.

    Retaining gets() in the C11 standard would obviously have been the wrong decision, and the current standard very properly eliminates it. But your question rests on the assumption that this was "always already" the right thing to do, and from a historical perspective, that assumption seems questionable.