Search code examples
c#xmlencodingutf-8xml-deserialization

XML Unicode deserialization


I have an XML file as following:

<?xml version="1.0" encoding="UTF-8"?>
<students xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
     <student name="Adnand"/>
     <student name="özil"/>
     <student name="ärnold"/>
</students>

As you see, I have an UTF-8 encoding, but I have used some non UTF-8 characters (ö, ä).

I use the following code to deserialize this XML:

public void readXML(string path)
{
    XmlSerializer deserializer = new XmlSerializer(typeof(Students));
    TextReader reader = new StreamReader(path);       
    object obj = deserializer.Deserialize(reader);
    Students myStudents = (Students)obj;
}

The deserialization process it's ok, but the special characters are shown as � symbol. I tryed changing the encoding type, but nothing. Can someone help me what alternatives I have?

ANSWER You should specify the Encoding.Default like

public void readXML(string path)
{
    XmlSerializer deserializer = new XmlSerializer(typeof(Students));
    TextReader reader = new StreamReader(path, Encoding.Default);       
    object obj = deserializer.Deserialize(reader);
    Students myStudents = (Students)obj;
}

Solution

  • It seems your file is not encoded as UTF-8 but as Window's default ANSI encoding.

    Defining the StreamReader as

    TextReader reader = new StreamReader(path, Encoding.Default)
    

    should do the trick.


    Note that this is more of a workaround and using Encoding.Default is actually a very bad idea since it will break when using another Culture. This article gives a nice overview why you should not use Encoding.Default (thanks to Alexander for sharing). It's better to use UTF-8 as most systems can deal with it.

    In your specific case to actually save the file as UTF-8 you either have to:

    • Adapt the program that creates the file to output it as UTF-8

    • Or if you used a text editor to create the file, use a text editor that supports UTF-8 encoding (e.g. Notepad++).