Search code examples
ciotype-conversionportability

Best way to portably assign the result of fgetc() to a char in C


Perhaps I'm overthinking this, as it seems like it should be a lot easier. I want to take a value of type int, such as is returned by fgetc(), and record it in a char buffer if it is not an end-of-file code. E.g.:

char buf;
int c = fgetc(stdin);

if (c < 0) {
    /* handle end-of-file */
} else {
    buf = (char) c;  /* not quite right */
}

However, if the platform has signed default chars then the value returned by fgetc() may be outside the range of char, in which case casting or assigning it to (signed) char produces implementation-defined behavior (right?). Surely, though, there is tons of code out there that does exactly the equivalent of the example. Is it all relying on implementation-defined behavior and/or assuming 7-bit data?

It looks to me like if I want to be certain that the behavior of my code is defined by C to be what I want, then I need to do something like this:

buf = (char) ((c > CHAR_MAX) ? (c - (UCHAR_MAX + 1)) : c);

I think that produces defined, correct behavior whether default chars are signed or unsigned, and regardless even of the size of char. Is that right? And is it really needful to do that to ensure portability?


Solution

  • fgetc() returns unsigned char and EOF. EOF is always < 0. If the system's char is signed or unsigned, it makes no difference.

    C11dr 7.21.7.1 2

    If the end-of-file indicator for the input stream pointed to by stream is not set and a next character is present, the fgetc function obtains that character as an unsigned char converted to an int and advances the associated file position indicator for the stream (if defined).

    The concern I have about is that is looks to be 2's compliment dependent and implying the range of unsigned char and char are both just as wide. Both of these assumptions are certainly nearly always true today.

    buf = (char) ((c > CHAR_MAX) ? (c - (UCHAR_MAX + 1)) : c);

    [Edit per OP comment]
    Let's assume fgetc() returns no more different characters than stuff-able in the range CHAR_MIN to CHAR_MAX, then (c - (UCHAR_MAX + 1)) would be more portable is replaced with (c - CHAR_MAX + CHAR_MIN). We do not know (c - (UCHAR_MAX + 1)) is in range when c is CHAR_MAX + 1.

    A system could exist that has a signed char range of -127 to +127 and an unsigned char range 0 to 255. (5.2.4.2.1), but as fgetc() gets a character, it seems to have all be unsigned char or all ready limited itself to the smaller signed char range, before converting to unsigned char and return that value to the user. OTOH, if fgetc() returned 256 different characters, conversion to a narrow ranged signed char would not be portable regardless of formula.