I am learning C++ using the books listed here. In particular, I read here that:
If the value represented by a single hexadecimal escape sequence does not fit the range of values represented by the character type used in this string literal (
char
, char8_t, (since C++20)char16_t, char32_t, (since C++11)or wchar_t), the result is unspecified.
(emphasis mine)
This means that, in a system where char
is signed, the result of '\xe4'
will be unspecified. But here the person says that "it is implementation defined and not unspecified".
So, my question: Is the behavior of the below statements unspecified or implementation-defined? That is, is this an error in cppreferene's documentation or have I understood it incorrectly.
char arr[] = {'\xe4','\xbd','\xa0','\xe5','\xa5','\xbd','\0'}; //unspecified or implementation defined
char ch = '\xef'; //unspecified or implementation defined
This can be either implementation defined (as per C++17) or (probably) well defined (as per C++23).
In C++17 (or earlier?), according to this Draft Standard:
5.13.3 Character literals [lex.ccon]
…
8 … The value of a character literal is implementation-defined if it falls outside of the implementation-defined range defined forchar
(for character literals with no prefix) orwchar_t
(for character literals prefixed byL
). …
However, from this Draft C++23 Standard (also §5.3.13, [lex.ccon]):
3.2.3 Otherwise, if the character-literal's encoding-prefix is absent or
L
, and v does not exceed the range of representable values of the corresponding unsigned type for the underlying type of the character-literal's type, then the value is the unique value of the character-literal's typeT
that is congruent to v modulo 2N, where N is the width ofT
.
So, in your case, as long as the value of the escaped sequence is representable by an unsigned char
, then there is neither undefined nor implementation-defined behaviour, as of C++23. However, if that value is outside the range of that unsigned
equivalent, then the literal is ill-formed:
3.2.4 Otherwise, the character-literal is ill-formed.
Note: This C++20 Draft Standard has the same clause as the above-cited C++17 version (although it's paragraph 7, rather than 8).