Search code examples
xsltsortingincrementmemory-efficient

efficient xslt conditional increment


In this question i asked how to perform a conditional increment. The provided answer worked, but does not scale well on huge data-sets.

The Input:

<Users>
    <User>
        <id>1</id>
        <username>jack</username>
    </User>
    <User>
        <id>2</id>
        <username>bob</username>
    </User>
    <User>
        <id>3</id>
        <username>bob</username>
    </User>
    <User>
        <id>4</id>
        <username>jack</username>
    </User>
</Users>

The desired output (in optimal time-complexity):

<Users>
   <User>
      <id>1</id>
      <username>jack01</username>
   </User>
   <User>
      <id>2</id>
      <username>bob01</username>
   </User>
   <User>
      <id>3</id>
      <username>bob02</username>
   </User>
   <User>
      <id>4</id>
      <username>jack02</username>
   </User>
</Users>

For this purpose it would be nice to

  • sort input by username
  • for each user
    • when previous username is equals current username
      • increment counter and
      • set username to '$username$counter'
    • otherwise
      • set counter to 1
  • (sort by id again - no requirement)

Any thoughts?


Solution

  • This transformation produces exactly the specified wanted result and is efficient (O(N)):

    <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
     xmlns:ext="http://exslt.org/common" exclude-result-prefixes="ext">
     <xsl:output omit-xml-declaration="yes" indent="yes"/>
     <xsl:strip-space elements="*"/>
    
     <xsl:key name="kUserByName" match="User" use="username"/>
     <xsl:key name="kUByGid" match="u" use="@gid"/>
    
     <xsl:variable name="vOrderedByName">
      <xsl:for-each select=
      "/*/User[generate-id()=generate-id(key('kUserByName',username)[1])]">
         <xsl:for-each select="key('kUserByName',username)">
           <u gid="{generate-id()}" pos="{position()}"/>
         </xsl:for-each>
      </xsl:for-each>
     </xsl:variable>
    
      <xsl:template match="node()|@*">
         <xsl:copy>
           <xsl:apply-templates select="node()|@*"/>
         </xsl:copy>
     </xsl:template>
    
     <xsl:template match="username/text()">
         <xsl:value-of select="."/>
         <xsl:variable name="vGid" select="generate-id(../..)"/>
    
         <xsl:for-each select="ext:node-set($vOrderedByName)[1]">
          <xsl:value-of select="format-number(key('kUByGid', $vGid)/@pos, '00')"/>
         </xsl:for-each>
     </xsl:template>
    </xsl:stylesheet>
    

    When applied on the provided XML document:

    <Users>
        <User>
            <id>1</id>
            <username>jack</username>
        </User>
        <User>
            <id>2</id>
            <username>bob</username>
        </User>
        <User>
            <id>3</id>
            <username>bob</username>
        </User>
        <User>
            <id>4</id>
            <username>jack</username>
        </User>
    </Users>
    

    the wanted, correct result is produced:

    <Users>
       <User>
          <id>1</id>
          <username>jack01</username>
       </User>
       <User>
          <id>2</id>
          <username>bob01</username>
       </User>
       <User>
          <id>3</id>
          <username>bob02</username>
       </User>
       <User>
          <id>4</id>
          <username>jack02</username>
       </User>
    </Users>