I have a C program with this main() function:
int main(int argc, char *argv[])
{
FILE *f = fopen(argv[1], "r");
...
}
Notice that it expects, when executing the program, a filename be provided as the first argument, e.g.,
main test.dat
The program works fine when I run it that way.
Interestingly, the program also works fine when I run it this way:
cat test.dat | main
That is not providing main() with a filename. It is streaming the content of test.dat to main(). Right? So how does it work?
Further elaboration: The main() function is the main in a Bison parser. I show the main() function below. As I mentioned, the parser works fine whether I invoke it this way:
main test.dat
or this way:
cat test.dat | main
Here is the parser's main() function:
int main(int argc, char *argv[])
{
yyin = fopen(argv[1], "r");
yyparse();
fclose(yyin);
return 0;
}
The fundamental problem is that you don't verify that fopen
worked. Every call to fopen()
should be followed by a check that the return value was not NULL. Otherwise, you will never notice that a user misspelled a filename, for example.
Normally, trying to use NULL FILE*
arguments to stdio functions is Undefined Behaviour, which typically results in a segfault. That doesn't happen with yyin
because the NULL is never passed through to stdio; the flex scanner notices that yyin
is NULL and converts it to stdin
. It does that because stdin
is the default input source (as per the Posix standard). Similarly, a NULL yyout
is treated as though it were stdout
.
It's probably OK to rely on this behaviour from Flex. But it should only be used deliberately, not accidentally.
If your application is invoked with no command-line arguments, then argc
will be 1, argv[0]
will be the name used to invoke the program, and argv[1]
will be NULL. (Technically, argc
could be 0, with even worse consequences, but that's unlikely in practice.) You then pass that NULL
to fopen
, which is Undefined Behaviour (that is to say, a grievous error). The implementation of fopen
in your standard library returns an error indication rather than segfaulting [Note 1], but as noted above you don't check for this error return. So the compounding of errors happens to result in yyin
being NULL, and Flex reading from stdin
.
You should always check for validity of user input. Always. Without exception. And you should report errors, or deal with them. There are no excuses. Not checking is dangerous, and at best wastes a lot of time; yours and that of whoever you enlist to help you.
Correct code might look like this:
if (argc > 1) {
yyin = fopen(argv[1], "r");
if (yyin == NULL) {
fprintf("Could not open file '%s': %s\n",
argv[1], strerror(errno));
exit(1);
}
}
else {
/* argc <= 1, so there was no command line argument.
* Read from stdin.
*/
yyin = stdin;
}
Most stdio libraries on Unix-like systems implement fopen
by first calling the Posix-defined open
function. The filename is simply passed through, so it's not examined at all. open
is usually a system call, so it's executed in kernel mode; that requires it to copy the filename from user memory to kernel memory, which in turn requires it to first validate the address. So on Unix, passing an invalid string pointer to fopen
is likely to produced some kind of error indication. This is not required by any standard, and there is no specification of the errno
code to use. It might not be the case on non-Posix platforms, where it's quite possible that fopen
needs to transform the filepath in some way prior to passing it to the native file system. (For example, it might need to translate /
directory separators to something else.) On such systems, it is quite likely that the filename argument will not be checked for validity, and the fopen
library function will segfault (or equivalent) when it tries to use an invalid filename pointer.
On most common Unix stdio library implementations, fopen
will segfault if the mode
argument is specified as NULL
. Like all library functions, fopen
is under no obligation to cope with NULL
pointer arguments; the C standard insists that it is undefined behaviour to pass NULL
as a pointer argument to any library function unless that library function is explicitly documented as accepting NULL
for that argument. (See, for example, free
, realloc
, and strtok
for library functions which explicitly allow NULL
.) fopen
is not such a function, so you shouldn't pass NULL
as any argument, and you certainly shouldn't assume that the result will just be an error return.