Search code examples
c#.netxmlxmldocumentcdata

Decode CDATA section in C#


I have a bit of XML as follows:

<section>
  <description>
    <![CDATA[
      This is a "description"
      that I have formatted
    ]]>
  </description>
</section>

I'm accessing it using curXmlNode.SelectSingleNode("description").InnerText but the value returns

\r\n      This is a "description"\r\n      that I have formatted
instead of
This is a "description" that I have formatted.

Is there a simple way to get that sort of output from a CDATA section? Leaving the actual CDATA tag out seems to have it return the same way.


Solution

  • You can use Linq to read CDATA.

    XDocument xdoc = XDocument.Load("YourXml.xml");
    xDoc.DescendantNodes().OfType<XCData>().Count();
    

    It's very easy to get the Value this way.

    Here's a good overview on MSDN: http://msdn.microsoft.com/en-us/library/bb308960.aspx

    for .NET 2.0, you probably just have to pass it through Regex:

         string xml = @"<section>
                          <description>
                            <![CDATA[
                              This is a ""description""
                              that I have formatted
                            ]]>
                          </description>
                        </section>";
    
            XPathDocument xDoc = new XPathDocument(new StringReader(xml.Trim()));
            XPathNavigator nav = xDoc.CreateNavigator();
            XPathNavigator descriptionNode = 
                nav.SelectSingleNode("/section/description");
    
            string desiredValue = 
                Regex.Replace(descriptionNode.Value
                                         .Replace(Environment.NewLine, String.Empty)
                                         .Trim(),
                    @"\s+", " ");
    

    that trims your node value, replaces newlines with empty, and replaces 1+ whitespaces with one space. I don't think there's any other way to do it, considering the CDATA is returning significant whitespace.