Search code examples
cwhile-loopchareofinteger-promotion

Why does reading a file with while ((c = getc(file)) != EOF) work on some platforms?


It is discussed in another thread that reading a file with the following code can result in an infinite loop because EOF is an integer outside the range of char and the while condition therefore cannot become true:

FILE* f;
char c;
std::string s;

f = fopen("/some/file", "r")
while ((c = getc(f)) != EOF)
  s += c;

Actually, I have observed that this results in an infinite loop under Linux on Raspberyy Pi, but works fine under Linux on a different hardware.

Why does it work at all?

The only explanation that comes to my mind is that the return value of the assingment statement (c = getc(f)) is not well-defined and returns the left value (a char in this case) on some platforms and the right value (an int in this case) on other platforms.

Can someone shed light on this behaviour?


Solution

  • It depends on how the type char behaves: whether as type signed char or as type unsigned char. If it behaves as unsigned char then a value in such an object never can be equal to EOF. If it behaves as signed char then due to the integer promotions it can be equal to EOF that usually is defined as -1.

    Try the following demonstrarion program.

    #include <stdio.h>
    
    int main(void) 
    {
        unsigned char c1 = EOF;
        signed char c2 = EOF;
        
        printf( "c1 == EOF is %s\n", c1 == EOF ? "true" : "false" );
        printf( "c2 == EOF is %s\n", c2 == EOF ? "true" : "false" );
        
        return 0;
    }
    

    Its output is

    c1 == EOF is false
    c2 == EOF is true
    

    Whether the type char behaves as the type signed char or unsigned char depends on compiler options.

    From the C Standard (6.2.5 Types)

    15 The three types char, signed char, and unsigned char are collectively called the character types. The implementation shall define char to have the same range, representation, and behavior as either signed char or unsigned char.

    For example as far as I know in IBM mainframes the type char behaves as the type unsigned char by default.