In the following program, I'm trying to measure the length of a string with non-ASCII characters.
But, I'm not sure why the size()
doesn't print the correct length when using non-ASCII characters.
#include <iostream>
#include <string>
int main()
{
std::string s1 = "Hello";
std::string s2 = "इंडिया"; // non-ASCII string
std::cout << "Size of " << s1 << " is " << s1.size() << std::endl;
std::cout << "Size of " << s2 << " is " << s2.size() << std::endl;
}
Output:
Size of Hello is 5
Size of इंडिया is 18
Live demo Wandbox.
I have used std::wstring_convert class and got the correct length of the strings.
#include <string>
#include <iostream>
#include <codecvt>
int main()
{
std::string s1 = "Hello";
std::string s2 = "इंडिया"; // non-ASCII string
std::wstring_convert<std::codecvt_utf8<char32_t>, char32_t> cn;
auto sz = cn.from_bytes(s2).size();
std::cout << "Size of " << s2 << " is " << sz << std::endl;
}
Live demo wandbox.
Importance reference link here for more about std::wstring_convert