Search code examples
c++stringc++11sizenon-ascii-characters

How to measure the correct size of non-ASCII characters?


In the following program, I'm trying to measure the length of a string with non-ASCII characters.

But, I'm not sure why the size() doesn't print the correct length when using non-ASCII characters.

#include <iostream>
#include <string>

int main()
{
    std::string s1 = "Hello";
    std::string s2 = "इंडिया"; // non-ASCII string
    std::cout << "Size of " << s1 << " is " << s1.size() << std::endl;
    std::cout << "Size of " << s2 << " is " << s2.size() << std::endl;
}

Output:

Size of Hello is 5
Size of इंडिया is 18

Live demo Wandbox.


Solution

  • I have used std::wstring_convert class and got the correct length of the strings.

    #include <string>
    #include <iostream>
    #include <codecvt>
    
    int main()
    {
        std::string s1 = "Hello";
        std::string s2 = "इंडिया"; // non-ASCII string
        std::wstring_convert<std::codecvt_utf8<char32_t>, char32_t> cn;
        auto sz = cn.from_bytes(s2).size();
        std::cout << "Size of " << s2 << " is " << sz << std::endl;
    }
    

    Live demo wandbox.

    Importance reference link here for more about std::wstring_convert