Search code examples

XSLT Performance Help for High Volume Data Set

I'm running an XSLT to transform a very high volume XML input (multiple millions of lines) and trying to make the transformation more efficient.

My input data looks something like this:

           <ID type='index'>ABC</ID>
           <ID type='objID'>0110</ID>
           <ID type='index'>ABC</ID>
           <ID type='objID'>0110</ID>
           <ID type='index'>XYZ</ID>
           <ID type='objID'>0221</ID>
           <ID type='index'>087</ID>
           <ID type='objID'>0330</ID>
           <ID type='index'>087</ID>
           <ID type='objID'>0330</ID>

I want to update the lineID to match against the primary lineID within the same group and copy over all the XML. So the output would look like this.

           <ID type='index'>ABC</ID>
           <ID type='index'>ABC</ID>
           <ID type='index'>ABC</ID>
           <ID type='index'>087</ID>
           <ID type='index'>087</ID>

This is my xslt and it is working, but it's a bit slow. I just can't figure our how to write a more efficient one. Any pointers, suggestions, or edits appreciated! I know the exists() function is not particularly efficient and I've considered going to 3.0, but I couldn't get the stream-able mode or input doc to work correctly.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl=""
 xmlns:xs="" exclude-result-prefixes="xs"
    <xsl:output indent="yes"/>
    <xsl:template match="@*|node()">
            <xsl:apply-templates select="@*|node()"/>
    <xsl:template match='lineID'>
        <xsl:variable name='ID' select='.'/>
        <xsl:variable name='groupID' select='../groupID'/>
                <xsl:when test='exists(../../entry[groupID = $groupID and primary = true() and lineID != $ID])'>
                        <xsl:value-of select='../../entry[groupID = $groupID and primary = true() and lineID != $ID][1]/lineID'/>
                    <xsl:value-of select='$ID'/>


  • I like the idea of using streaming with group-adjacent; given that the values of the entry elements are in child elements you will need to use copy-of():

    <xsl:stylesheet xmlns:xsl="" version="3.0"
      <xsl:mode on-no-match="shallow-copy" streamable="yes"/>
      <xsl:template match="root">
          <xsl:for-each-group select="entry!copy-of()" group-adjacent="groupID">
            <xsl:apply-templates select="current-group()" mode="grounded">
              <xsl:with-param name="lineID" tunnel="yes" select="current-group()[primary = 'true'][1]/lineID"/>
      <xsl:mode name="grounded" on-no-match="shallow-copy"/>
      <xsl:template match="lineID" mode="grounded">
        <xsl:param name="lineID" tunnel="yes"/>
        <xsl:copy>{($lineID, .)[1]}</xsl:copy>
      <xsl:output indent="yes"/>

    An accumulator alone will not help in my view if I understand your data right as it is not clear at which position (at all) your "primary" lineID occurs.