Search code examples
c++charstdstring

0xc2 character in std::string


The following string has size 4 not 3 as I would have expected.

std::string s = "\r\n½"; 
int ss = s.size(); //ss is 4

When loop through the string character by character escaping it to hex I get

  • 0x0D (hex code for carriage return)
  • 0x0A (hex code for line feed)
  • 0xc2 (hex code, but what is this?)
  • 0xbd (hex code for the ½ character)

Where does the 0xc2 come from? Is it some sort of encoding information? I though std::string had a char per visible character in the string. Can someone confirm 0xc2 is a "character set modifier"?


Solution

  • "½" has, in unicode, the code point U+00BD and is represented by UTF-8 by the two bytes sequence 0xc2bd. This means, your string contains only three characters, but is four bytes long.

    std::string::size is unaware of the string content encoding, and returns a byte count.

    See https://www.fileformat.info/info/unicode/char/00bd/index.htm

    Additional reading on SO: std::wstring VS std::string.