Search code examples
c++c++20getlinewstringwifstream

Why does std::getline() seem to mess up accented characters?


I am trying to use and display French accented characters in my C++20 program.

However, using std::getline() to read inside a file seems to mess up accented characters, like so:

#include <locale>
#include <iostream>
#include <fstream>

int main(void)
{
    setlocale(LC_ALL,"");
    std::wifstream  file("test.txt");
    std::wstring    s;
    std::getline(file, s);
    std::wcout << s << std::endl;
    return 0;
}

Content of test.txt (encoded in UTF-8):

Salut ! Comment ça va ? éèêëâàäáôûöüùîï

Result:

$>./test
Salut ! Comment ça va ? éèêëâà äáôûöüùîï

However, when I try to display the same text as a const std::wstring, the result has no problem:

#include <locale>
#include <iostream>

int main(void)
{
    setlocale(LC_ALL,"");
    std::wstring    s = L"Salut ! Comment ça va ? éèêëâàäáôûöüùîï";
    std::wcout << s << std::endl;
    return 0;
}

Result:

$>./test
Salut ! Comment ça va ? éèêëâàäáôûöüùîï

Using setlocale(LC_ALL, "") has made the problem better, as before even the second example would not work, but there seems to be a problem with std::getline() that I don't get.

I read that I might need to imbue a locale into the std::wifstream, but I could not understand how to make it work.

I'm fairly new to C++, so I'm not sure if there are better tools for this kind of problem, at least I couldn't find any.

I'm using zsh on MinGW, integrated into VSCode.

I compile with the following command:

c++ -Wall -Wextra -Werror -std=c++20 test.cpp -o test

Solution

  • I was able to solve this problem thanks to this post!
    Imbuing was the solution, here is what solved my problem:

    #include <locale>
    #include <codecvt>
    #include <iostream>
    #include <fstream>
    
    int main(void)
    {
        setlocale(LC_ALL,"");
        std::wifstream  file("test.txt");
        file.imbue(std::locale(std::locale(), new std::codecvt_utf8<wchar_t,0x10ffff, std::consume_header>));
        std::wstring    s;
        std::getline(file, s);
        std::wcout << s << std::endl;
        return 0;
    }
    

    This line:

    file.imbue(std::locale(std::locale(), new std::codecvt_utf8<wchar_t,0x10ffff, std::consume_header>));
    

    was originally:

    file.imbue(std::locale(std::locale::empty(), new std::codecvt_utf8<wchar_t,0x10ffff, std::consume_header>));
    

    However, std::locale::empty() is platform-specific as seen in this SO question so I replaced it by std::locale() and it worked fine.