Search code examples
.netnestedtabsxmlreader

Properly reading a nested XML document using an XmlReader.ReadInnerXML in .NET 3.5


I'm using an XmlReader.ReadInnerXML to read an XML document (as text) embedded within in an element of an outer XML document. This works fine except for the handling of tab characters in attributes of the inner XML. Example:

<document>
  <interface>
    <scriptaction script="&#x9;one tab&#xD;&#xA;&#x9;&#x9;two tabs&#xD;&#xA;&#x9;&#x9;&#x9;three tabs" />
  </interface>
</document>

When ReadInnerXML is used at the "document" element level, the resulting string looks like this:

<interface><scriptaction script=" one tab&#xD;&#xA;  two tabs&#xD;&#xA;   three tabs"/></interface>

IOW, the tabs are turned into actual tab characters. Then when we then parse the resulting inner document, the tabs are normalized into spaces in the usual whitespace handling fashon, and the result is the conversion of tab characters to spaces. We need to preserve the attribute values as they are.

We've tried messing with various XmlReader settings to no avail. Is this possibly a defect in the reader, or something we're doing wrong?

Thanks in advance,

-- Nathan Allan - Database Consulting Group


Solution

  • I'm afraid this behaviour is required by the XML spec: http://www.w3.org/TR/REC-xml/#AVNormalize

    Do you control the XML generation? Can you use a CDATA section instead?