Search code examples
c++winapivisual-c++unicode-stringwidechar

Strange unicode error when converting Chinese wide strings to regular strings in C++


Some of my Chinese software users noticed a strange C++ exception being thrown when my C++ code for Windows tried to list all running processes:

在多字节的目标代码页中,没有此 Unicode 字符可以映射到的字符。

Translated to English this roughly means:

There are no characters to which this Unicode character can be mapped in the multi-byte target code page.

The code which prints this is:

try
{
    list_running_processes();
}
catch (std::runtime_error &exception)
{
    LOG_S(ERROR) << exception.what();
    return EXIT_FAILURE;
}

The most likely culprit source code is:

std::vector<running_process_t> list_running_processes()
{
    std::vector<running_process_t> running_processes;

    const auto snapshot_handle = unique_handle(CreateToolhelp32Snapshot(TH32CS_SNAPPROCESS, 0));
    if (snapshot_handle.get() == INVALID_HANDLE_VALUE)
    {
        throw std::runtime_error("CreateToolhelp32Snapshot() failed");
    }
    
    PROCESSENTRY32 process_entry{};
    process_entry.dwSize = sizeof process_entry;

    if (Process32First(snapshot_handle.get(), &process_entry))
    {
        do
        {
            const auto process_id = process_entry.th32ProcessID;
            const auto executable_file_path = get_file_path(process_id);
            // *** HERE ***
            const auto process_name = wide_string_to_string(process_entry.szExeFile);
            running_processes.emplace_back(executable_file_path, process_name, process_id);
        } while (Process32Next(snapshot_handle.get(), &process_entry));
    }

    return running_processes;
}

Or alternatively:

std::string get_file_path(const DWORD process_id)
{
    std::string file_path;
    const auto snapshot_handle = unique_handle(CreateToolhelp32Snapshot(TH32CS_SNAPMODULE, process_id));
    MODULEENTRY32W module_entry32{};
    module_entry32.dwSize = sizeof(MODULEENTRY32W);
    if (Module32FirstW(snapshot_handle.get(), &module_entry32))
    {
        do
        {
            if (module_entry32.th32ProcessID == process_id) 
            {
                return wide_string_to_string(module_entry32.szExePath); // *** HERE ***
            }
        } while (Module32NextW(snapshot_handle.get(), &module_entry32));
    }

    return file_path;
}

This is the code for performing a conversion from a std::wstring to a regular std::string:

std::string wide_string_to_string(const std::wstring& wide_string)
{
    if (wide_string.empty())
    {
        return std::string();
    }

    const auto size_needed = WideCharToMultiByte(CP_UTF8, 0, &wide_string.at(0),
        static_cast<int>(wide_string.size()), nullptr, 0, nullptr, nullptr);
    std::string str_to(size_needed, 0);
    WideCharToMultiByte(CP_UTF8, 0, &wide_string.at(0), static_cast<int>(wide_string.size()), &str_to.at(0),
        size_needed, nullptr, nullptr);
    return str_to;
}

Is there any reason this can fail on Chinese language file paths or Chinese language Windows etc.? The code works fine on regular western Windows machines. Let me know if I'm missing any crucial pieces of information here since I cannot debug or test this on my own right now without access to one of the affected machines.


Solution

  • I managed to test on a Chinese machine and it turns out that converting a file path from wide string to a regular string will produce a bad file path output if the file path contains e.g. Chinese (non-ASCII) symbols.

    I could fix this bug by replacing calls to wide_string_to_string() with std::filesystem::path(wide_string_file_path).string() since the std::filesystem API will handle the conversion correctly for file paths unlike wide_string_to_string().