Search code examples
c#.netfilefile.readalllines

File.ReadLines returns null char after each other char


I am trying to read all lines from the file, however I am getting some unexpected results, code:

var readLines = File.ReadLines(file);

foreach (var line in readLines)
{
    //line = "T\0e\0s\0t\0"
}

File contents:

Test

If I will do line.Replace("\0", "") then it works fine however I would like to understand why this is happening and how I can get correct value from the file using ReadLines?


Solution

  • Your file seems to be encoded in UTF-16. Specify the encoding in the second parameter to ReadLines()

    var readLines = File.ReadLines(file, Encoding.Unicode);
    

    File.ReadLines() without the second parameter assumes UTF-8 as the encoding of the file. UTF-16 files use two bytes to encode a character (latin characters use the first one in UTF-16, and only one byte in UTF-8). So to UTF-8, in your text every other character is \0.