Search code examples
c++linuxwindowsutf-8locale

How to display accented characters with a C++ program on all platforms?


I'm trying to port a C++11 program from Windows to Linux (GCC-4.9). Originally, I just set the locale inside the program

setlocale(LC_ALL, "");

However, it was displaying missing characters on Linux (Latest version of Linux Mint). I then proceeded to save all my source files in the UTF-8 format, which fixed the problem under linux, but now all the characters are messed up in windows.

If that help, the language is french. Is there any ways to correctly display the text under both platforms without too much trouble?

I'd appreciate help, thank you.

void EcranBienvenue()
{
    char coinHG = (char)201;
    char coinHD = (char)187;
    char coinBG = (char)200;
    char coinBD = (char)188;
    char ligneH = (char)205;
    char ligneV = (char)186;
#ifdef _WIN32
    system("cls");
#elif defined __linux__
        system("clear");
#else
        cout << string(20,'\n');
#endif
    setlocale(LC_ALL, "C");
    cout << coinHG;
    for (int i = 0; i < 48; i++)
        cout << ligneH;
    cout << coinHD << endl;
    cout << ligneV << "                                                " << ligneV << endl;
    cout << ligneV << "     Les productions                 inc        " << ligneV << endl;
    cout << ligneV << "                                                " << ligneV << endl;
    cout << ligneV << "     Système de gestion des abonnements         " << ligneV << endl;
    cout << ligneV << "                                                " << ligneV << endl;
    cout << coinBG;
    for (int i = 0; i < 48; i++)
        cout << ligneH;
    cout << coinBD << endl;
    setlocale(LC_ALL, "");

}

It's normal that the border doesn't work on Linux, yet. However, the three lines of text will be displayed accurately on the terminal.

On windows, "è" will be an incorrect character.

Système de gestion des abonnements 

Solution

  • There are lots of different ways to do this sort of thing, but there are certainly some bad ways. A couple things I strongly recommend avoiding:

    • do not change the global C or C++ locales ever. For the most part just avoid locales altogether.
    • do not use wchar_t (except hidden inside APIs you implement across platforms, use wchar_t only for your Windows implementation).
    • don't use legacy encodings except where absolutely required. (legacy encodings are everything except UTF-8, UTF-32 and UTF-16.

    The problems you're seeing are because you're passing text data between interfaces using the wrong encodings.

    For example:

    Système de gestion des abonnements
    

    This results because you're passing UTF-8 encoded text to an interface that expects data encoded with (probably) Microsoft's codepage 850 (Your console's OEM codepage).

    You need to know what encoding an interface requires in order to use it. You also need to know what encoding your data is using. To that end, you should choose a consistent encoding to use in your code, and at interface boundaries convert other data to and from that encoding as necessary. I believe UTF-8 is the best choice for cross platform code.


    Due to shortcomings in MSVC's implementation of the standard C and C++ IO facilities, you are probably best off implementing your own IO API with a native Win32 implementation.

    Here's a page that talks about implementing output functionality on Windows.

    The print function implemented in this article takes wchar_t input. Here's one way to convert UTF-8 to UTF-16/wchar_t:

    #include <codecvt>
    #include <locale>
    
    std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>, wchar_t> convert;
    
    std::string str = "Système de gestion des abonnements";
    UPrint(convert.from_bytes().c_str());
    

    Additionally you could implement a streambuf that correctly handles writing to Windows' console and replace the streambuf in std::cout with it, so that printing to cout would then print correctly to the console. Remember to restore the original streambuf before exiting so that the destruction of cout's can succeed. You could have a RAII type object handle both setting the stream buffer and switching it back later.

    Such a program might look like:

    int main() {
      Set_utf8_safe_streambuf buffer_swapper(std::cout); // on windows swaps cout's streambuf with one that can print UTF-8 to the console, does nothing on other platforms
    
      std::cout << "Système de gestion des abonnements" << '\n'; // utf-8 data
    }
    

    Here's an answer with a few details on implementing and swapping a streambuf.