Search code examples
xmlxsltxpathxslt-2.0saxon

XSLT function returns different results [Saxon-EE vs Saxon-HE/PE]


I am currently working on a pure XSL-Transformation with Saxon-Processor in various versions. Below is my short stylesheet, simplified for the needs of my question:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    xmlns:foo="bar">

    <xsl:output encoding="UTF-8" method="text"/>

    <xsl:template match="/">
        <xsl:text>Call of func_1: </xsl:text>        
        <xsl:value-of select="foo:func_1()"/>

        <xsl:text>&#xA;Call of func_1: </xsl:text>
        <xsl:value-of select="foo:func_1()"/>

        <xsl:text>&#xA;Call of func_1: </xsl:text>
        <xsl:value-of select="foo:func_1()"/>

        <xsl:text>&#xA;Call of func_2: </xsl:text>
        <xsl:value-of select="foo:func_2()"/>
    </xsl:template>

    <xsl:function name="foo:func_1" as="xs:string">
        <!-- do some other stuff -->
        <xsl:value-of select="foo:func_2()"/>
    </xsl:function>

    <xsl:function name="foo:func_2" as="xs:string">
        <xsl:variable name="node">
            <xsl:comment/>
        </xsl:variable>
        <xsl:sequence select="generate-id($node)"/>
    </xsl:function>

</xsl:stylesheet>

Description

foo:func_1 is a wrapper function to return the value of a second function + doing other stuff, which can be ignored. this concept of function calls other function is mandatory!

foo:func_2 generates a unique id for an element. This element is created in a local scoped variable named "node".

Different results based on Saxon versions

expected result:

Call of func_1: d2
Call of func_1: d3
Call of func_1: d4
Call of func_2: d5

Saxon-EE 9.6.0.7 / Saxon-EE 9.6.0.5 result

Call of func_1: d2
Call of func_1: d2
Call of func_1: d2
Call of func_2: d3

Saxon-HE 9.6.0.5 / Saxon-PE 9.6.0.5 / Saxon-EE 9.5.1.6 / Saxon-HE 9.5.1.6 result

like expected

Question / furthermore in depth

I debugged the problem on my own as far as i could. IF i would change the xsl:value-of in function "func_1" to xsl:sequence, the results will be the same for all versions [like expected]. But that's not my intention!

I want to understand, what is the difference between xsl:value-of and xsl:sequence throughout Saxon versions. Is there any "hidden" caching? What is the correct way to work with xsl:sequence and xsl:value-of in my case. [btw: i know already, value-of creates a text node with the result of the select-statement. sequence could be a reference to a node or atomic value. don't solve my problem afaik]


Solution

  • This is a long-standing and rather deep problem. In a pure functional language, calling a pure function twice with the same arguments always produces the same result. This makes many optimizations possible, such as pulling a function call out of a loop if the arguments are invariant, or inlining a function call if it's not recursive. Unfortunately XSLT and XQuery functions aren't quite purely functional: in particular, they are defined so that if the function creates new nodes, then calling the function twice produces different nodes (f() is f() returns false).

    The Saxon optimizer tries quite hard to optimize as far as it can within these constraints, in particular by recognizing functions that create new nodes and avoiding aggressive optimization of such functions.

    But the spec itself isn't 100% prescriptive. For example, if as in your example there is a local variable with no dependencies on function arguments, I think the spec gives license to the implementation as to whether the value of the variable is the same node on each evaluation, or is a new node.

    As Martin says, the new XSLT 3.0 attribute new-each-time is an attempt to get this under control: if you really want a new node each time the function is called, you should specify new-each-time="yes".

    Note:

    The specific optimization that is happening here (which you can see by running with the -explain option) is that func_2 is first inlined, and then its body is being extracted into a global variable. Some releases are doing this and others aren't - it can be very sensitive to minor changes. The best advice is not to depend on functions having this kind of side-effect. It would help if you explained your real problem, then perhaps we could find an approach that isn't so sensitive to edge cases in the language semantics.