Search code examples
c#unicodecharacter-encodingtextreader

Why does TextReader.Read return an int, not a char?


Consider the following code ( .Dump() in LinqPad simply writes to the console):

var s = "𤭢"; //3 byte code point. 4 byte UTF32 encoded
s.Dump();
s.Length.Dump(); // 2
TextReader sr = new StringReader("𤭢");
int i;
while((i = sr.Read()) >= 0)
{
    // notice here we are yielded two
    // 2 byte values, but as ints
    i.ToString("X").Dump(); // D852, DF62
}

Given the outcome above, why does TextReader.Read() return an int and not a char. Under what circumstances might it read a value greater than 2 bytes?


Solution

  • TextReader.Read() will never read greater than 2 bytes; however, it returns -1 to mean "no more characters to read" (end of string). Therefore, its return type needs to go up to Int32 (4 bytes) from Char (2 bytes) to be able to express the full Char range plus -1.