Perhaps I'm overthinking this, as it seems like it should be a lot easier. I want to take a value of type int, such as is returned by fgetc(), and record it in a char buffer if it is not an end-of-file code. E.g.:
char buf;
int c = fgetc(stdin);
if (c < 0) {
/* handle end-of-file */
} else {
buf = (char) c; /* not quite right */
}
However, if the platform has signed default chars then the value returned by fgetc() may be outside the range of char, in which case casting or assigning it to (signed) char produces implementation-defined behavior (right?). Surely, though, there is tons of code out there that does exactly the equivalent of the example. Is it all relying on implementation-defined behavior and/or assuming 7-bit data?
It looks to me like if I want to be certain that the behavior of my code is defined by C to be what I want, then I need to do something like this:
buf = (char) ((c > CHAR_MAX) ? (c - (UCHAR_MAX + 1)) : c);
I think that produces defined, correct behavior whether default chars are signed or unsigned, and regardless even of the size of char. Is that right? And is it really needful to do that to ensure portability?
fgetc()
returns unsigned char
and EOF. EOF is always < 0. If the system's char
is signed
or unsigned
, it makes no difference.
C11dr 7.21.7.1 2
If the end-of-file indicator for the input stream pointed to by stream is not set and a next character is present, the fgetc function obtains that character as an unsigned char converted to an int and advances the associated file position indicator for the stream (if defined).
The concern I have about is that is looks to be 2's compliment dependent and implying the range of unsigned char
and char
are both just as wide. Both of these assumptions are certainly nearly always true today.
buf = (char) ((c > CHAR_MAX) ? (c - (UCHAR_MAX + 1)) : c);
[Edit per OP comment]
Let's assume fgetc()
returns no more different characters than stuff-able in the range CHAR_MIN
to CHAR_MAX
, then (c - (UCHAR_MAX + 1))
would be more portable is replaced with (c - CHAR_MAX + CHAR_MIN)
. We do not know (c - (UCHAR_MAX + 1))
is in range when c is CHAR_MAX + 1
.
A system could exist that has a signed char
range of -127 to +127 and an unsigned char
range 0 to 255. (5.2.4.2.1), but as fgetc()
gets a character, it seems to have all be unsigned char
or all ready limited itself to the smaller signed char
range, before converting to unsigned char
and return that value to the user. OTOH, if fgetc()
returned 256 different characters, conversion to a narrow ranged signed char
would not be portable regardless of formula.