Search code examples
xmlxsltcdatacalabashxproc

XProc and CDATA


I have an XSLT that creates some CDATA within a node.

XML:

<test><inner>stuff</inner></test>

XSLT:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    exclude-result-prefixes="xs"
    version="2.0">
    <xsl:output method="xml" indent="yes"/>
    <xsl:template match="test">
        <wrapper>
                <xsl:text disable-output-escaping="yes">&lt;![CDATA[</xsl:text>
                <xsl:copy-of select="*"/>
                <xsl:text disable-output-escaping="yes">]]&gt;</xsl:text>
        </wrapper>
    </xsl:template>
</xsl:stylesheet>

This transform, executed via Saxon, returns:

<wrapper><![CDATA[<inner>stuff</inner>]]></wrapper>

I am aware that I am wrapping XML in CDATA and that this is kind of ridiculous. But this is what is expected by an API that I am working with, so I have no choice but to follow this pattern.

Now I am trying to include this transform as part of a larger XProc pipeline:

<p:pipeline xmlns:p="http://www.w3.org/ns/xproc" version="1.0" >
<p:xslt>
    <p:input port="stylesheet">
        <p:document href="test.xsl" />
    </p:input>
</p:xslt>

Which returns (using the latest version of Calabash):

<wrapper>&lt;![CDATA[<inner>stuff</inner>]]&gt;</wrapper>

It seems that XProc doesn't honor the disable-output-escaping attribute.

I went on to try a few XProc functions including p:unescape-markup and various combinations of p:string-replace, but I couldn't find a solution that didn't adversely impact the rest of my output.

Any ideas what I might try next?


Solution

  • An XSLT processor is not required to support d-o-e:

    An XSLT processor will only be able to disable output escaping if it controls how the result tree is output. This may not always be the case. For example, the result tree may be used as the source tree for another XSLT transformation instead of being output.

    This is especially true in pipelining: XSLT may not control serialization of the output tree, but only pass it on to the next step in the pipeline as a DOM or as SAX events. But even if it could,

    An XSLT processor is not required to support disabling output escaping. If an xsl:value-of or xsl:text specifies that output escaping should be disabled and the XSLT processor does not support this, the XSLT processor may signal an error; if it does not signal an error, it must recover by not disabling output escaping.

    So you really can't rely on d-o-e, especially in a pipeline.

    But this is what is expected by an API that I am working with, so I have no choice but to follow this pattern.

    I can sympathize with the situation, having used faulty tools in the past because they were the best available. However, the presence (and boundaries) of a CDATA section are explicitly not in the XML Infoset. So an API that depends on CDATA sections is faulty with regard to its XML input requirements. If it truly does depend on CDATA sections, it would be a good idea to file a bug report about it.

    On the other hand, maybe the API you're working with doesn't actually require CDATA sections; maybe it just requires that you feed it XML that's escaped in some way? If so, there are other ways to accomplish that, without requiring a specific serialization that is outside of the XML Infoset. If you can show us documentation about the API, we could help determine what it actually requires.