Could you please explain to me what exactly is mbstate_t
? I have read the cppreference description, but I still don't understand its purpose. What I do understand is that mbstate_t
is some static struct visible for a limited set of functions like mbtowc()
, wctomb()
etc., but I am still confused about how to use it. I can see in cppreference examples that this struct should be reset before calling some functions. Assume, I want to count characters in a multi-language string like this one:
std::string str = "Hello! Привет!";
Apparently, str.size()
cannot be used in this example, because it simply returns the number of bytes in the string. But something like this does the job:
std::locale::global(std::locale("")); // Linux, UTF-8
std::string str = "Hello! Привет!";
std::string::size_type stringSize = str.size();
std::string::size_type nCharacters = 0;
std::string::size_type nextByte = 0;
std::string::size_type nBytesRead = 0;
std::mbtowc(nullptr, 0, 0); // What does it do, and why is it needed?
while (
(nBytesRead = std::mbtowc(nullptr, &str[nextByte], stringSize - nextByte))
!= 0)
{
++nCharacters;
nextByte += nBytesRead;
}
std::cout << nCharacters << '\n';
According to cppreference examples, before entering the while loop mbstate_t
struct should be reset by calling mbtowc()
with all arguments being zeros. What is the purpose of this?
The interface to mbtowc
is kind of crazy. A historical mistake, I guess.
You are not required to pass it a complete string, but can pass a buffer (perhaps a network package) that ends in an incomplete multi-byte character. And then pass the rest of the character in the next call.
So mbtowc
will have to store its current (possibly partial) conversion state between calls. Possibly as a static variable.
A call to std::mbtowc(nullptr, 0, 0);
will clear this internal state, so its is ready for a new string.
You might want to use mbrtowc
instead and provide a non-hidden mbstate_t
as an extra parameter.