Search code examples
c++unicodewxwidgetsstdstring

Cross-platform file names and wxString::ToStdString()


I want to handle files in a cross-platform application using wxWidgets 3.1. I rely on some functions that only accept the file name as an std::string.

On Windows, I can simply use wxString::ToStdString() and everything is fine.

On Linux (Ubuntu 20.04 LTS), the conversion fails and returns an empty string when there are "special" characters in the file name or path (e.g., the default downloads directory on a French Ubuntu "Téléchargement").

When I specify the following converter explicitly on Linux, the conversion succeeds: std::string str = wxs.ToStdString(wxMBConvUTF8());

But this does not work on Windows and scrambles the "special" characters.

I guess, I could write platform-dependent code to deal with this, but that defeats the purpose of the toolkit.

I have done quite a bit of research on this but I am utterly confused now. I thought wxString uses std::string under the hood in a Unicode wxWidgets build (which I am using)? Why is this (apparently) platform-dependent? What am I missing?

Here is a minimal example that will pop up three messageboxes: The first one shows the wxString correctly, the second one shows no string (because the conversion fails), the third one shows the string once the conversion is done explicitly. On Windows, the first two boxes show the string correctly and the last one shows the wrong characters for the two 'é'.

#include "wx/wx.h"
#include <fstream>

class MyApp : public wxApp
{
public:
    virtual bool OnInit() wxOVERRIDE;
};

class MyFrame : public wxFrame
{
public:
    MyFrame(const wxString& title);

private:
};
wxIMPLEMENT_APP(MyApp);


bool MyApp::OnInit()
{
    if (!wxApp::OnInit())
        return false;

    // create the main application window
    MyFrame* frame = new MyFrame("Minimal wxWidgets App");
    frame->Show(true);
    return true;
}


// Some file I/O function
std::string openFile(std::string fileName)
{
    std::ifstream file(fileName);
    if (file)
    {
        return "Success!";
    }
    else
    {
        return "Failure!";
    }

}

// frame constructor
MyFrame::MyFrame(const wxString& title)
    : wxFrame(NULL, wxID_ANY, title)
{
    wxString name = wxFileSelector("Pick a file");  // Pick a file with a "special" character in the name, e.g. Äréa.txt

    wxMessageBox(openFile(name.ToStdString()));  // Success! on Windows; Failure! on Linux

    wxMessageBox(openFile(std::string(name.ToUTF8())));  // Failure! on Windows; Success! on Linux
}

EDIT: I found out that there was this recent addition to wxWidgets (in 3.1.5):

Add wxString::utf8_string() This adds a yet another conversion function, which is not ideal, but still better than having to write ToStdString(wxConvUTF8) every time for losslessly converting wxString to std::string: not only this is too long, but it's also too easy to forget to specify wxConvUTF8, resulting in data loss when using non-UTF-8 locale.

So you would think that clears it up. BUT I am using a UTF-8 locale! This is the output of the locale command:

LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC=de_DE.UTF-8
LC_TIME=de_DE.UTF-8
LC_COLLATE="en_US.UTF-8"
LC_MONETARY=de_DE.UTF-8
LC_MESSAGES="en_US.UTF-8"
LC_PAPER=de_DE.UTF-8
LC_NAME=de_DE.UTF-8
LC_ADDRESS=de_DE.UTF-8
LC_TELEPHONE=de_DE.UTF-8
LC_MEASUREMENT=de_DE.UTF-8
LC_IDENTIFICATION=de_DE.UTF-8
LC_ALL=

EDIT2: I changed the minimal example to something closer to what I am actually trying to do.

EDIT3: I changed the example to something even closer to what I am actually trying to do. :-)


Solution

  • Apparently the function wxString::ToStdString() uses the encoding of the current locale of the program.

    The default locale of the program is not the locale set in the user environment. All C and C++ programs start in a locale named "C". In order to use the locale specified in the user environment, one needs to call

    setlocale(LC_ALL, "");
    

    at the beginning of the program. C++ has its own method of doing this

     std::locale::global(std::locale(""));
    

    but sometimes it doesn't work, presumably because of bugs in the C++ standard library implementation, so the C way is more reliable.

    If your framework has its own method of dealing with locales, you should probably use it instead.