Search code examples
xmlxsltutf-8cdata

How to extract text from CDATA with XSLT?


I have some xml-unfriendly characters in my xmls. Some xml parsers can be tuned to be friendly with those chars. But I've decided to surround them with CDATA to avoid XSL processor errors. But I have to modify xsl somehow. Here Is what I have now:

<subject>
   <![CDATA[svn commit: r41657 - head/en_US.ISO8859-1/books/handbook/basics]]>
</subject>

I have a variable

<xsl:variable name="message_subject">
<xsl:text> “</xsl:text>
<xsl:value-of select="/browser/message/subject"/>
<xsl:text>”</xsl:text>
</xsl:variable>

It is used this way:

<h1>
  <xsl:copy-of select="$message_subject"/>
</h1>

and gives me

<h1>
   “<![CDATA[svn commit: r41657 - head/en_US.ISO8859-1/books/handbook/basics]]>”
</h1>

The problem is that CDATA is mixed with desired string. I use net.sf.saxon.TransformerFactoryImpl. How to make xslt take only contents of CDATA?


Solution

  • (a) There is nothing in your XML that requires CDATA. The only characters in XML that need escaping are & and <, and neither of these appears in your data.

    (b) XSLT sees the data after CDATA tags are stripped. In your example, it will see exactly the same content as if the CDATA tags were not there.

    The output you show is very strange, and I don't know how you are achieving it. I don't know why you are choosing to use such an old version of Saxon, but I doubt that is the explanation. There's something else going on that we don't know about.