Search code examples
c++windowsunicodecommand-line-argumentsboost-program-options

Reading Unicode characters from command line arguments using boost::program_options in Windows


I have several Windows applications that read a file path from command-line arguments. Everything works flawlessly, except when passing paths with non-ANSI characters. I expected this, but don't know how to deal with it. Probably an entry-level question but that is driving me crazy.

My current code looks like:

int main(int argc, char* argv[]) {
    namespace po = boost::program_options;

    po::options_description po_desc("Allowed options");
    po_desc.add_options()
        ("file", po::value<std::string>(), "path to file");

    po::variables_map po_vm;
    try {
        po::store(po::parse_command_line(argc, argv, po_desc), po_vm);
        po::notify(po_vm);
    } catch (...) {
        std::cout << po_desc << std::endl;
        return false;
    }

    const std::string file_path = po_vm["file"].as<std::string>();

    // ...
}

I've found that if I replace the type of file_path from std::string to boost::filesystem::path, some paths are now read. I don't know exactly why but can deduce that it has to be with a translation from the Latin1 charset.

For example, having following files:

malaga.txt
málaga.txt
mąlaga.txt

The first is always read correctly, while the second one fails when using std::string file_path but not boost::filesystem::path file_path. The third one always fails.

I've tried switching the main function to int main(int argc, wchar_t* argv) and using std::wstring for the argument type, but it is not compatible with boost::program_options parser.

How can I correctly read such Unicode file names?


Solution

  • Thanks for everyone contributing with their comments, thanks to them I managed to solved my problem.

    TL;DR

    Here the fixed code:

    int wmain(int argc, wchar_t* argv[]) { // <<<
        namespace po = boost::program_options;
    
        po::options_description po_desc("Allowed options");
        po_desc.add_options()
            ("file", po::wvalue<std::wstring>(), "path to file") // <<<
            ("ansi", po::value<std::string>(), "an ANSI string")
            ;
    
        po::variables_map po_vm;
        try {
            po::store(po::wcommand_line_parser(argc, argv) // <<<
                        .options(po_desc)
                        .run(),
                      po_vm);
            po::notify(po_vm);
        } catch (...) {
            std::cout << po_desc << std::endl;
            return false;
        }
    
        const boost::filesystem::path file_path = po_vm["file"].as<std::wstring>(); // <<<
    
        // ...
    }
    

    Explanation

    First, switch to wmain and wchar_t* argv: as mentioned by @erik-sun, it is necessary to switch the entry point to an Unicode aware function. Important note: it is possible to use int main(int, wchar_t*) (in the sense it will compile) but it won't receive arguments with the correct codification and parser will fail, you have to use wmain.

    Then, the Unicode support link provided by @richard-critten was very useful for understanding the compilation errors:

    • use boost::program_options::wvalue when the type is wide-char. The internal implementation uses a string stream: the default one only works with 8-bits chars.
    • use boost::program_options::wcommand_line_parser to accept wchar_t* arguments. Unfortunately, this class doesn't have an all-in-one constructor and you must use the long form for parsing the command line.
    • finally, retrieve the value as std::wstring when needed.

    I've extended the code snippet to show it is still compatible with std::string inputs.

    Side note

    My complete solution requires instantiating a Qt QApplication at some point. QApplication constructor is incompatible with the wide-char argv. As no command-line interaction is needed with the Qt part (everything is processed long before by Boost), it can be re-written to receiv fake arguments:

    int fake_argc = 1;
    char* fake_argv[] = {"AplicationName"};
    QApplication a(fake_argc, fake_argv);