The following string has size 4 not 3 as I would have expected.
std::string s = "\r\n½";
int ss = s.size(); //ss is 4
When loop through the string character by character escaping it to hex I get
Where does the 0xc2 come from? Is it some sort of encoding information? I though std::string had a char per visible character in the string. Can someone confirm 0xc2 is a "character set modifier"?
"½" has, in unicode, the code point U+00BD
and is represented by UTF-8 by the two bytes sequence 0xc2bd
. This means, your string contains only three characters, but is four bytes long.
std::string::size
is unaware of the string content encoding, and returns a byte count.
See https://www.fileformat.info/info/unicode/char/00bd/index.htm
Additional reading on SO: std::wstring VS std::string.