Are the standards saying that casting to wint_t
and to wchar_t
in the following two programs is guaranteed to be correct?
#include <locale.h>
#include <wchar.h>
int main(void)
{
setlocale(LC_CTYPE, "");
wint_t wc;
wc = getwchar();
putwchar((wchar_t) wc);
}
--
#include <locale.h>
#include <wchar.h>
#include <wctype.h>
int main(void)
{
setlocale(LC_CTYPE, "");
wchar_t wc;
wc = L'ÿ';
if (iswlower((wint_t) wc)) return 0;
return 1;
}
Consider the case where wchar_t
is signed short
(this
hypothetical implementation is limited to the BMP), wint_t
is signed int
, and WEOF == ((wint_t)-1)
. Then (wint_t)U+FFFF
is
indistinguishable from WEOF
. Yes, U+FFFF
is a reserved codepoint, but
it's still wrong for it to collide.
I would not want to swear that this never happens in real life without an exhaustive audit of existing implementations.
See also May wchar_t be promoted to wint_t?
On the environment you describe, wchar_t
cannot accurately describe the BMP: L'\uFEFF'
exceeds the range of wchar_t
as its type is the unsigned equivalent to wchar_t
. (C11 6.4.4.4 Character constants p9). Storing it to wchar_t
defined as signed short
, assuming 16-bit shorts, changes its value.
On the other hand, if the charset used for the source code is Unicode and the compiler is properly configured to parse its encoding correctly, L'ÿ'
has the value 255
with an unsigned type, so the code in the second example is perfectly defined and unambiguous.
If int
is 32-bit wide and short
16-bit wide, it seems much more consistent to define wchar_t
as either int
or unsigned short
. WEOF
can then be defined as (-1)
, a value different from all values of wchar_t
or at least all values representing Unicode code-points.