Search code examples
clanguage-lawyerbinaryfilesfgetc

Is it possible to confuse EOF with a normal byte value when using fgetc?


We often use fgetc like this:

int c;
while ((c = fgetc(file)) != EOF)
{
    // do stuff
}

Theoretically, if a byte in the file has the value of EOF, this code is buggy - it will break the loop early and fail to process the whole file. Is this situation possible?

As far as I understand, fgetc internally casts a byte read from the file to unsigned char and then to int, and returns it. This will work if the range of int is greater than that of unsigned char.

What happens if it's not (probably then sizeof(int)=1)?

  • Will fgetc read a legitimate data equal to EOF from a file sometimes?
  • Will it alter the data it read from the file to avoid the single value EOF?
  • Will fgetc be an unimplemented function?
  • Will EOF be of another type, like long?

I could make my code fool-proof by an extra check:

int c;
for (;;)
{
    c = fgetc(file);
    if (feof(file))
        break;
    // do stuff
}

It is necessary if I want maximum portability?


Solution

  • Yes, c = fgetc(file); if (feof(file)) does work for maximum portability. It works in general and also when the unsigned char and int have the same number of unique values. This occurs on rare platforms with char, signed char, unsigned char, short, unsigned short, int, unsigned all using the same bit width and width of range.

    Note that feof(file)) is insufficient. Code should also check for ferror(file).

    int c;
    for (;;)
    {
        c = fgetc(file);
        if (c == EOF) {
          if (feof(file)) break;
          if (ferror(file)) break;
        }
        // do stuff
    }