Search code examples
c++stllanguage-lawyerostream

C++ When are characters widened in output stream operator<<()?


It seems to me, that there is an inconsistency in the C++ standard, specifically in §30.7.5.2.4 of the C++17 draft (N4659), about when characters are widened in formatted output operations on output streams (operator<<()). Exactly the same inconsistency seems to be reflected in en.cppreference.com.

First, assume the following declarations:

std::ostream out;
std::wostream wout;
char ch;
wchar_t wch;
const char* str;
const wchar_t* wstr;

It is then stated that

  1. out << ch does not perform character widening,
  2. out << str performs character widening,
  3. wout << ch performs character widening,
  4. wout << str performs character widening,
  5. wout << wch does not perform character widening,
  6. wout << wstr performs character widening.

The first and most obvious inconsistency is that (6) cannot be true, as there is no widen() function taking a wchar_t argument, only one that takes a char argument.

The second (seeming) inconsistency is between (1) and (2). It seems strange to me that out << "x" should widen 'x', while out << 'x' should not.

Am I misinterpreting the standard text, or is there something wrong there? If the latter is true, do you know what the intended behavior is?

EDIT: Apparently, this inconsistency (if I am right), has been present in the standard since at least C++03 (§27.6.2.5.4). The text changes a bit through the intermediate standards, but the inconsistency, as I explain it above, remains.


Solution

  • It looks as if the standard isn't entirely correct. Most of the issue stems from the bulk-specification of the respective operations. Instead of handling each overload individually similar overloads are described together resulting in a misleading specification.

    I doubt, any implementer has any trouble understanding what is intended, though. Essentially when a char is inserted into a non-char stream the character needs to be widen()ed to obtain the character of the stream's character type. This widening is intended to map one character from the source character set to the one character in the stream's wide character set.

    Note that the IOStreams specification assumes the original notion of characters in streams being individual entities. Since the specification was created (for the C++1998 version) the text wasn't really updates substantially but with wide use of Unicode the "characters" in a stream are really bytes of an encoding. Although the streams mostly function OK in this modified environment, some flexibility which would be helpful to deal with Unicode characters isn't really properly supported. The absence of something "widening" one character into a sequence of UTF8 bytes is probably one of these.

    If you feel the inconsistency/incorrectness in the stream's section warrants addressing, file a defect report. Instruction on filing defect reports are at http://isocpp.org. When you do raise an issue consider providing proposed wording to correct the issue. Since there is no lack of clarity what is actually intended and probably most implementations do the right thing anyway I'd expect this issue to get fairly low priority and without proposed wording it is unlikely to receive much attention. Of course, addressing the issue won't change the intended behavior, e.g., to "widen" chars into a UTF8 sequence: that would effectively be a redesign of the streams library which may be in order but won't be done as part of defect resolution.