Search code examples
c++stlwstringsetlocale

wstring::find() doesn't work with non-latin symbols?


I have an wide-character string (std::wstring) in my code, and I need to search wide character in it.

I use find() function for it:

    wcin >> str;
    wcout << ((str.find(L'ф') != wstring::npos)? L"EXIST":L"NONE");

L'ф' is a Cyrillic letter.

But find() in same call always returns npos. In a case with Latin letters find() works correctly.

It is a problem of this function? Or I incorrectly do something?

UPD

I use MinGW and save source in UTF-8. I also set locale with setlocale(LC_ALL, "");. Code same wcout << L'ф'; works coorectly. But same

wchar_t w;
wcin >> w;
wcout << w;

works incorrectly.

It is strange. Earlier I had no problems with the encoding, using setlocale ().


Solution

  • The encoding of your source file and the execution environment's encoding may be wildly different. C++ makes no guarantees about any of this. You can check this by outputting the hexadecimal value of your string literal:

    std::wcout << std::hex << L"ф";
    

    Before C++11, you could use non-ASCII characters in source code by using their hex values:

    "\x05" "five"
    

    C++11 adds the ability to specify their Unicode value, which in your case would be

    L"\u03A6"
    

    If you're going full C++11 (and your environment ensures these are encoded in UTF-*), you can use any of char, char16_t, or char32_t, and do:

    const char* phi_utf8 = "\u03A6";
    const char16_t* phi_utf16 = u"\u03A6";
    const char32_t* phi_utf16 = U"\u03A6";