I'm trying to transform XML:
<catalog>
<country><![CDATA[ WIN8 <b>X</b> Mac OS ]]></country>
</catalog>
into
<catalog>
<country><![CDATA[ WIN8 <b>X</b> Mac OS ]]></country>
</catalog>
with an XSL transform.
I know that using disable-output-escaping="yes" or cdata-section-elements I could transform escaped characters into unescaped and put inside CDATA, but this does not work if charaters are already inside CDATA.
Is there a simple way for this? Thanks.
This
<catalog>
<country><![CDATA[ WIN8 <b>X</b> Mac OS ]]></country>
</catalog>
is equivalent to
<catalog>
<country> WIN8 <b>X</b> Mac OS </country>
</catalog>
Which is exactly what you get when using
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes" />
<xsl:template match="node() | @*">
<xsl:copy>
<xsl:apply-templates select="node() | @*" />
</xsl:copy>
</xsl:template>
<xsl:template match="country/text()">
<xsl:value-of select="." disable-output-escaping="yes" />
</xsl:template>
</xsl:stylesheet>
The point is that disable-output-escaping
(DOE) has no effect in an element that falls into cdata-section-elements
(CSE). That's because both directives disable output escaping.
The text value " WIN8 <b>X</b> Mac OS "
becomes:
when serialized normally: WIN8 <b>X</b> Mac OS
when serialized with CSE: <![CDATA[ WIN8 <b>X</b> Mac OS ]]>
when serialized with DOE: WIN8 <b>X</b> Mac OS
Note how the last two renderings are exactly the same, except for the enclosing <![CDATA[ ... ]]>
.
CDATA disables output escaping for text node children of an element and in exchange encloses them in <![CDATA[ ... ]]>
markers to make up for the lost level of escaping.
If you additionally set DOE on an <xsl:value-of>
that outputs a text into an element that has CSE set, nothing happens. Output escaping already is disabled.
Therefore this
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes" />
<xsl:output cdata-section-elements="country" />
<xsl:template match="node() | @*">
<xsl:copy>
<xsl:apply-templates select="node() | @*" />
</xsl:copy>
</xsl:template>
<xsl:template match="country/text()">
<xsl:value-of select="." disable-output-escaping="yes" />
</xsl:template>
</xsl:stylesheet>
will give you exactly what your input was.
That's why you cannot get rid of double escaping and have CDATA
during the same transformation. You could use a two-step approach (1st step disables output escaping, 2nd step adds back CDATA) if you positively must have CDATA in the result document — but personally I think it's not worth it.