Search code examples
xsltxslt-1.0

how to get 'excel' new lines in spreadsheetML and the behaviour of nodeset() on disable-output-escaping (Saxon xslt 1.0)


This is a follow up question to how to get 'excel' new lines in spreadsheetML (MSXSLT)

but asked as a new question, to separate this into different issue, as the behaviour seems to be different between engines (I'll leave the specific context in the other question, this is purely how to achieve some functional result).

This XSLT (in saxon he) will create what I want.

<xsl:stylesheet
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    version="1.0">
    <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
    <xsl:template match="/">
        <root>
            <bar>
                <xsl:text disable-output-escaping="yes">&amp;#10;</xsl:text>
            </bar>
        </root>
    </xsl:template>
</xsl:stylesheet>

and gives the output

<root>
   <bar>&#10;</bar>
</root>

this one wont:

<xsl:stylesheet
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:exsl="http://exslt.org/common"
    version="1.0">
    <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
    <xsl:template match="/">
        <xsl:variable name="foo">
            <bar>
                <xsl:text disable-output-escaping="yes">&amp;#10;</xsl:text>
            </bar>
        </xsl:variable>
        <root>
            <xsl:copy-of select="exsl:node-set($foo)"/>
        </root>
    </xsl:template>
</xsl:stylesheet>

it gives

   <bar>&amp;#10;</bar>

(the question is about XSLT 1.0 but interestingly XSLT 3.0 can be made to work like this

<xsl:stylesheet
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    version="3.0">
    <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
    <xsl:template match="/">
        <xsl:variable name="foo">
            <bar>
                <xsl:text disable-output-escaping="yes">&amp;#10;</xsl:text>
            </bar>
        </xsl:variable>
        <root>
            <xsl:sequence select="$foo"/>
        </root>
    </xsl:template>
</xsl:stylesheet>

whilst

        <xsl:copy-of select="$foo"/>

doesnt. Even following the 'sequence' pattern, I don't seem to be able to preserve non escaping in anything but a non trivial xslt - I've got a complex transformation using call-templates/apply-templates etc, and I think understanding how nodes are interpreted and serialised is not trivial)


Solution

  • There's actually a long history to this question, which was known in the working group as the "sticky d-o-e problem" (d-o-e being disable-output-escaping). The question is, does d-o-e have any effect when writing to a temporary tree (an xsl:variable), or is it only effective when writing to serialized output?

    The XSLT 1.0 specification is pretty clear on the matter:

    It is an error for output escaping to be disabled for a text node that is used for something other than a text node in the result tree. Thus, it is an error to disable output escaping for an xsl:value-of or xsl:text element that is used to generate the string-value of a comment, processing instruction or attribute node; it is also an error to convert a result tree fragment to a number or a string if the result tree fragment contains a text node for which escaping was disabled. In both cases, an XSLT processor may signal the error; if it does not signal the error, it must recover by ignoring the disable-output-escaping attribute.

    XSLT 2.0 deprecated d-o-e, but retained the rule in a slightly different form:

    This [property], however, can be set only within a final result tree that is being passed to the serializer.

    But in between those two versions, the working group dithered. The XSLT 1.1 working draft (which never became a recommendation, but was popularised by the first version of my XSLT book) says:

    When a root node is copied using an xsl:copy-of element ... and escaping was disabled for a text node descendant of that root node, then escaping should also be disabled for the resulting copy of that text node. For example

    <xsl:variable name="x">
      <xsl:text disable-output-escaping="yes">&lt;</xsl:text>
    </xsl:variable>
    <xsl:copy-of select="$x"/>
    

    This is the "sticky d-o-e" - the d-o-e property is attached to the text node in the temporary tree and springs into life when the text node is eventually serialized. So this behaviour was endorsed at some stage in the life of XSLT, and you may be using a processor that implements this version of the spec.

    Generally, though, try to forget that d-o-e exists. Whatever the problem, it's not the best solution. It's an incredibly messy feature because it requires a breaking of the architectural boundary between the transformation processor and the serializer, and breaking this boundary leads to close coupling of the transformation and serialization, and prevents you reusing the same code in a different pipeline configuration.

    I'm afraid that researching the history of the W3C spec on this is rather easier than researching exactly what was implemented in early versions of Saxon (which are now nearly a quarter of a century old).