I am trying to read all lines from the file, however I am getting some unexpected results, code:
var readLines = File.ReadLines(file);
foreach (var line in readLines)
{
//line = "T\0e\0s\0t\0"
}
File contents:
Test
If I will do line.Replace("\0", "")
then it works fine however I would like to understand why this is happening and how I can get correct value from the file using ReadLines?
Your file seems to be encoded in UTF-16. Specify the encoding in the second parameter to ReadLines()
var readLines = File.ReadLines(file, Encoding.Unicode);
File.ReadLines()
without the second parameter assumes UTF-8 as the encoding of the file. UTF-16 files use two bytes to encode a character (latin characters use the first one in UTF-16, and only one byte in UTF-8). So to UTF-8, in your text every other character is \0
.