Search code examples
.netxmlc#-2.0whitespacecdata

How to read CDATA XML Content


I have the following xml file:

<?xml version="1.0" encoding="utf-8"?>
    <root>
<phrase id="test"><![CDATA[test]]></phrase>
<phrase id="test0"><![CDATA[test0]]></phrase>
<phrase id="test2"><![CDATA[test2]]></phrase>
<phrase id="test3">test3</phrase>
<phrase id="test4">
    <![CDATA[test4
LINEBREAK]]>
</phrase>
<phrase id="test5">
LINEBREAK</phrase>
<phrase id="test6"><![CDATA[test6]]></phrase>
<phrase id="test7">
    <![CDATA[test7
ANOTHER LINEBREAK]]>
</phrase>
</root>

As you can see, the emelemts CAN contain cdata sections to wrap linebreaks and spaces correctly. The problem is, if I use the following code, the linebreak and tabs BEFORE and AFTER the CData are captured as well.

So I descided to use IgnoreWhitespace=true, but this skips every second node. Why is that?

XmlReaderSettings sett = new XmlReaderSettings();
sett.IgnoreWhitespace = true;
using (XmlReader r = XmlTextReader.Create(filePath, sett))
{
    while (r.ReadToFollowing("phrase"))
    {
        string attrib = r.GetAttribute("id").ToLowerInvariant();
        string content = r.ReadElementContentAsString();
    }
}

Please note that my project is limited to .net 2.0


Solution

  • Try ReadString instead of ReadElementContentAsString:

    while (r.ReadToFollowing("phrase"))
    {
        string attrib = r.GetAttribute("id").ToLowerInvariant();
        string content = r.ReadString();
    }