Search code examples
c++stringutf-16

Check if all characters in UTF16 string are valid?


I have a problem where I have UTF16 strings (std::wstring) that might have "invalid" characters which causes my console terminal to stop printing (see question).

I wonder if there is a fast way to check all the characters in a string and replace any invalid chars with ?.

I know I could do something along these lines with a regex, but it would be difficult to make it validate all valid chars, and also slow. Is there e.g. a numeric range for the char codes that I might use e.g. all char codes between 26-5466 is valid?


Solution

  • It should be possible to use std::ctype<wchar_t> to determine if a character is printable:

    std::local loc;
    std::replace_if(string.begin(), string.end(),
                    [&](wchar_t c)->bool { return !std::isprint(c, loc); }, L'?');