Search code examples
linqc#-4.0cdata

Parse CData from XML in C#


Am trying to parse my xml which has CData tag as the value for one of its nodes. My XML structure is as below.

<node1>
<node2>
<![CDATA[ <!--@@@BREAK TYPE="TABLE" @@@--> <P><CENTER>... html goes here.. ]]>
</node2>
</node1>

My code is as below. When I parse I get response with CData tag and not the value in the CData tag. Can you pls help me fix my problem?

XDocument xmlDoc = XDocument.Parse(responseString);
XElement node1Element = xmlDoc.Descendants("node1").FirstOrDefault();
string cdataValue = node1Element.Element("node2").Value;

Actual Output: <![CDATA[ <!--@@@BREAK TYPE="TABLE" @@@--> <P><CENTER>... html goes here.. ]]>

Expected Output:  <!--@@@BREAK TYPE="TABLE" @@@--> <P><CENTER>... html goes here..

I was not sure if System.XML.Linq.XDocument was causing the problem. So I tried XMLDocument version as below.

XmlDocument xmlDoc = new XmlDocument();
xmlDoc.LoadXml(responseString);
XmlNode node = xmlDoc.DocumentElement.SelectSingleNode(@"/node1/node2");
XmlNode childNode = node.ChildNodes[0];
if (childNode is XmlCDataSection)
{}

And my if loop returns false. So looks like there is something wrong with my xml and it is actually not a valid CData? Pls help me fix the problem. Pls let me know if you need more details.


Solution

  • It was because StreamReader was escaping the html. So "<" was getting changed to "&lt;". Hence it was not getting recognized correctly as a cdatatag. So had to do unescape first - XDocument xmlDoc = XDocument.Parse(HttpUtility.HtmlDecode(responseString))

    and that fixed it.