Search code examples
c#utf-8linq-to-xmlxmldocument

Read XML with Arabic data embedded c#


I am trying to load an XML file that contains a mix of ASCII text and Arabic characters. Here is the top snippet:

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE TS>
<TS version="2.1" language="ar_EG">
<context>
    <message>
        <location filename="ui/aboutdialog.cpp" line="90"/>
        <source>You have </source>
        <translation type="unfinished">يوجد لديك</translation>
    </message>
    <message>
        <location filename="ui/aboutdialog.cpp" line="90"/>
        <source> launches left</source>
        <translation type="unfinished">عدد التشغيلات المتبقية</translation>
    </message>
</context>

I want to load this up into a C# TreeView object, but I am having issues with loading into XDocument or XMLDocument.

Using this:

XDocument xd = XDocument.Load(File.ReadAllText(tbxTSFileName.Text));

or

XDocument xd = XDocument.Load(File.ReadAllText(tbxTSFileName.Text, Encoding.GetEncoding(874)));

Gives me a "Invalid URI: Uri string is too long" error.

Using this:

XmlDocument xd = new XmlDocument();
xd.Load(tbxTSFileName.Text);

Gives the error "Invalid character in the given encoding. Line 9 position 40".


Solution

  • Read the documentation for the method you're calling.

    XDocument.Load takes a URL, not an XML string.

    You want XDocument.Parse.