Search code examples
c#encodingxmlreadermemory-mapped-files

Change the encoding to UTF-8 on a stream (MemoryMappedViewStream)


I am using the code below to read a ~2.5Gb Xml file as fast as I can (thanks to MemoryMappedFile). However, I am getting the following exception: "'.', hexadecimal value 0x00, is an invalid character. Line 9778, position 73249406.". I beleive it is due to some encoding problem. How do I make sure that the MemoryMappedViewStream reads the file using UTF-8?

static void Main(string[] args)
{
    using (var file = MemoryMappedFile.CreateFromFile(@"d:\temp\temp.xml", FileMode.Open, "MyMemMapFile"))
    {
        using (MemoryMappedViewStream stream = file.CreateViewStream())
        {
            Read(stream);
        }
    }
}

static void Read(Stream stream)
{
    using (XmlReader reader = XmlReader.Create(stream))
    {
        reader.MoveToContent();

        while (reader.Read())
        {
        }
     }
 }

Solution

  • You could use the StreamReader class to set the encoding:

    static void Main(string[] args)
    {
      using (var file = MemoryMappedFile.CreateFromFile(@"d:\temp\temp.xml", FileMode.Open,  "MyMemMapFile"))
      {
         using (MemoryMappedViewStream stream = file.CreateViewStream())
        {
            Read(stream);
        }
       }
    }
    
    static void Read(Stream stream)
    {
      using (XmlReader reader = XmlReader.Create(new StreamReader(stream, Encoding.UTF8)))
      {
         reader.MoveToContent();
    
        while (reader.Read())
        {
        }
     }
    }
    

    Hope, this helps.