Search code examples
delphiwinapiunicodeconsoledelphi-xe7

Is CP_UTF8 a supported codepage for WriteConsoleA / WriteFile?


There are many examples were people suggest using tricks similar to this to get Unicode console output:

begin
  OldConsoleOutputCP := GetConsoleOutputCP();
  SetConsoleOutputCP(CP_UTF8);
  try
    // Might also use WriteConsoleA, but this has drawbacks with output redirection
    WriteFile(GetStdHandle(STD_OUTPUT_HANDLE), Utf8Bytes, ...);
  finally
    // We better restore the output CP that was in use before our program started!
    SetConsoleOutputCP(OldConsoleOutputCP);
  end;
end.

This seems to work pretty well.

The MSDN documentation only ever mentions (at least to my knowledge) that you should use WriteConsoleW for console output and WriteFile for redirected output. (You can detect whether the handle is a console handle via the return value of GetConsoleMode and similar methods).

Is it officially supported by Microsoft to use SetConsoleOutputCP(CP_UT8) to output Unicode text to the console and redirected output? If yes where is it documented?

I thought that the UTF-8 multi byte codepage should only ever be used for the WideCharToMultiByte and MultiByteToWideChar functions?


Solution

  • Is it officially supported by Microsoft to use SetConsoleOutputCP(CP_UT8) to output Unicode text to the console and redirected output?

    It certainly isn't explicitly supported, but it's difficult to say what counts as ‘supported’ here—this is an area where the documentation is poor-to-just-wholly-missing.

    In practice there are serious bugs with I/O, including WriteFile, when the console is on the other end and its code page is set to 65001. In general, Win32 I/O APIs (and the MSVCRT stdlib routines built on top of them) fail by returning number-of-bytes-written-or-read counts that are actually numbers of characters.

    It doesn't matter in your example because you're ignoring the lpNumberOfBytesWritten outparam of WriteFile, but typically when you use non-ASCII characters, the wrong counts will result in mangled repeated output and hangs waiting for more data on trying to read input.

    This is a bug in the console (conhost): it has special-case support to pass correct counts back to Windows for double-byte-character-set code pages that are used as the default code pages for any install locale (‘language for non-Unicode applications’), but not for other general-purpose multi-byte encodings.

    As linked by @IInspectable at least one part of Microsoft have specifically declined to fix one visible aspect of this problem, albeit not the bit of Microsoft that owns the root cause. Either way sadly it doesn't look like an end to this long-standing and deeply frustrating problem is coming any time soon.

    // Might also use WriteConsoleA, but this has drawbacks with output redirection

    Yes, a common approach is to detect whether stdout is the console yourself (eg using _isatty) and branch to WriteConsoleW instead in that case.