In Windows, the value of the Unicode character ö
(Latin small letter o with diaeresis) in the CP437 character set is 148
.
In Linux, the byte value for ö
in the UTF-8 encoding is:
-61(Hi Byte)
-74(Lo Byte)
(unsigned value = 46787)
My Question is, how can I convert from 148
from CP437 to UTF-8 in C++ on Linux?
The detailed info for my problem lies here:
open() function in Linux with extended characters (128-255) returns -1 error
Temporary solution:
C++11 supports the conversion to UTF-8 using codecvt_utf8
On Windows, you can use the Win32 MultiByteToWideChar()
function to convert data from CP437 to UTF-16, and then use the WideCharToMultiByte()
function to convert data from UTF-16 to UTF-8.
On Linux, you can use a Unicode conversion library, like libiconv or ICU (which are available for Windows, too).
In C++11 and later, you can use std::wstring_convert
to:
convert from CP437 to either UTF-16 or UTF-32/UCS-4 (if you can get/make a codecvt
for CP437, that is).
then, convert from UTF-16 or UTF-32/UCS-4 to UTF-8.
You can't use codecvt_utf8
to convert from CP437 to UTF-8 directly. It only supports conversions between:
UTF-8 and UCS-2 (not UTF-16!)
UTF-8 and UTF-32/UCS-4.
You have to use codecvt_utf8_utf16
for conversions between UTF-8 and UTF-16.
Or, you can use mbrtoc16()
to convert CP437 to UTF-16 using a CP437 locale, and then use c16rtomb()
to convert UTF-16 to UTF-8 using a UTF-8 locale (if your STL library implements a fix for DR488, otherwise c16rtomb()
only supports UCS-2 and not UTF-16!).
Otherwise, just create your own CP437-to-UTF8 lookup table for the 256 possible CP437 bytes, and then do the conversion manually, one byte at a time.