Search code examples
xsltxpathnode-set

Changing, with xsl, element hierarchy by grouping element by attribute values, without assumption on attribute values?


Question

With xsl, how to change element hierarchy by grouping element by attribute values, without assumption on attribute values?

Description of the problem

The context of the document is the following: the xml tracks the change notes (<releaseHistory/>) of a software framework as new versions are released (<build/>). This framework has several app/components (<changes app='LibraryA|Driver|...'/>). The change notes logs the new features or bug fixes (<list kind='New|Enhancement'/>).

I would like to transform this document such that all the change notes across the different builds are merged in lists grouped by the 'app' attribute value and 'kind' attribute values, with list items (<li/>) sorted by the 'priority' attribute.

In addition, no assumption should be made about the 'app' and 'kind' attribute values. Note that, if needed, I can change the schema of the xml if it the schema is not ideal.

Current status

  • What I was able to do:
    • retrieve the list of unique 'app' and 'kind' attribute values.
    • a template that take as parameters the 'app' and 'kind' and traverse the xml document to merge all the elements whose attribute matches arguments
  • What is missing:
    • 'looping' over the above list of unique attribute values and apply the template

Input and expected output

The xml document:

<?xml version="1.0" encoding="UTF-8"?>

<releaseHistory>

 <build>
   <description>A killer update</description>
   <changes app='LibraryA'>
     <list kind='New'>
       <li priority='4'>Added feature about X</li>
       <li priority='2'>Faster code for big matrices</li>
     </list>
     <list kind='Enhancement'>
       <li priority='1'>Fixed integer addition</li>
     </list>
   </changes>
   <changes app='Driver'>
     <list kind='New'>
       <li priority='3'>Supporting new CPU models</li>
       <li priority='4'>Cross-platform-ness</li>
     </list>
   </changes>
 </build>

 <build>
   <description>An update for Easter</description>
   <changes app='LibraryA'>
     <list kind='New'>
       <li priority='1'>New feature about Y</li>
     </list>
     <list kind='Enhancement'>
       <li priority='2'>Fixed bug 63451</li>
     </list>
   </changes>
   <changes app='LibraryVector'>
     <list kind='Enhancement'>
       <li priority='5'>Fixed bug 59382</li>
     </list>
   </changes>
   <changes app='Driver'>
     <list kind='New'>
       <li priority='0'>Compatibility with hardware Z</li>
     </list>
   </changes>
 </build>

</releaseHistory>

Expected document:

<?xml version="1.0" encoding="UTF-8"?>

<mergedHistory>

  <changes app='LibraryA'>
   <list kind='New'>
     <li priority='1'>New feature about Y</li>
     <li priority='2'>Faster code for big matrices</li>
     <li priority='4'>Added feature about X</li>
   </list>
   <list kind='Enhancement'>
      <li priority='1'>Fixed integer addition</li>
      <li priority='2'>Fixed bug 63451</li>
   </list>
  </changes>

  <changes app='Driver'>
    <list kind='New'>
      <li priority='0'>Compatibility with hardware Z</li>
      <li priority='3'>Supporting new CPU models</li>
      <li priority='4'>Cross-platform-ness</li>
    </list>
  </changes>

  <changes app='LibraryVector'>
    <list kind='Enhancement'>
      <li priority='5'>Fixed bug 59382</li>
    </list>
  </changes>

</mergedHistory>

Part of the solution

I am 'already' able to list the unique 'app' and 'kind' attributes with xsl. Let's detail the current state of the xsl

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:exsl="http://exslt.org/common" extension-element-prefixes="exsl"
>

Retrieve all distinct 'app' attribute values (LibraryA,Driver,...) <changes app='...'/> and storing them in a variable (could be a param):

<xsl:key name="appDistinct" match="changes" use="@app"/>
<xsl:variable name="applicationListVarTmp">
  <list>
    <xsl:for-each select="//changes[generate-id() = generate-id(key('appDistinct', @app)[1])]">
      <li>
        <xsl:value-of select="normalize-space(@app)"/>
      </li>
    </xsl:for-each>
  </list>
</xsl:variable>

Retrieve all distinct 'kind'attribute values (New, Enhancement) <list kind='...'/>:

<xsl:key name="kindDistinct" match="changes/list" use="@kind"/>
<xsl:variable name="kindListVar">
  <list>
    <xsl:for-each select="//changes/list[generate-id() = generate-id(key('kindDistinct', @kind)[1])]">
      <li>
        <xsl:value-of select="normalize-space(@kind)"/>
      </li>
    </xsl:for-each>
  </list>
</xsl:variable>

A template to merge all <li/> of a given 'app' and 'kind' (ordered by priority) with parameters:

<xsl:template name="mergeSameKindChangesForAnApp">
  <xsl:param name="application" />
  <xsl:param name="kindness" />
  <list><xsl:attribute name='kind'><xsl:value-of select="$kindness"/></xsl:attribute>
    <xsl:for-each select="//changes[@app=$application]/list[@kind=$kindness]/li">
      <xsl:sort select="@priority" data-type="number" order="ascending"/>
      <xsl:copy>
        <xsl:copy-of select="@*"/>
        <xsl:copy-of select="./*"/>
      </xsl:copy>
    </xsl:for-each>
  </list>
</xsl:template>

Now, where I am stuck is about 'looping' on the appListVar and kindListVar to apply the template.

If all the 'app' and 'kind' were hardcoded, I could make several calls like:

<xsl:call-template name="mergeSameKindChangesForAnApp">
  <changes app='LibraryA'>
    <xsl:with-param name="application">
      LibraryA
    </xsl:with-param>
    <xsl:with-param name="kindness">
      New
    </xsl:with-param>
  </changes>
</xsl:call-template>

but I would like to loop on the 'app's and 'kind's found in the xml document. With exsl:node-set(), for example, I could do

<xsl:param name="applicationListVar" select="exsl:node-set($applicationListVarTmp)" />


<xsl:call-template name="mergeSameKindChangesForAnApp">
  <changes app='LibraryA'>
    <xsl:with-param name="application">
      <xsl:value-of select="$applicationListVar/list/li[2]"/>
    </xsl:with-param>
    <xsl:with-param name="kindness">
      New
    </xsl:with-param>
  </changes>
</xsl:call-template>

but still, how to loop on $applicationListVar/list/li elements? 'Looping' doesn't sound xslt-ilish, may be (for sure?) it is not the right approach.

The question is long, I have tried to simplify it in comparison to the actual case.


Solution

  • This should do it:

    <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
      <xsl:output method="xml" indent="yes" omit-xml-declaration="yes"/>
    
      <xsl:key name="kChange" match="changes" use="@app" />
    
      <!-- A key for locating <list>s by the combination of their @app and @kind-->
      <xsl:key name="kList" match="changes/list" use="concat(../@app, '+', @kind)" />
    
      <!-- A node-set of the first instance of each <list> for each distinct
           pair of @app + @kind -->
      <xsl:variable name="distinctLists"
                    select="//changes/list[generate-id() = 
                               generate-id(key('kList', 
                                               concat(../@app, '+', @kind) )[1]
                                          )]"/>
    
      <!-- Identity template -->
      <xsl:template match="@* | node()">
        <xsl:copy>
          <xsl:apply-templates select="@* | node()" />
        </xsl:copy>
      </xsl:template>
    
      <xsl:template match="/*">
        <mergedHistory>
          <!-- Apply templates on distinct <changes> elements -->
          <xsl:apply-templates select="build/changes[generate-id() = 
                                       generate-id(key('kChange', @app)[1])]" />
        </mergedHistory>
      </xsl:template>
    
      <!-- Each distinct <changes> (based on @app) will be sent to this template -->
      <xsl:template match="changes">
        <changes>
          <xsl:apply-templates select="@*" />
    
          <!-- Apply templates on each distinct <list> with the same @app
               as the current context-->
          <xsl:apply-templates select="$distinctLists[../@app = current()/@app]" />
        </changes>
      </xsl:template>
    
      <!-- Each distinct <list> (based on @app and @kind) will be 
           sent to this template -->
      <xsl:template match="list">
        <list>
          <xsl:apply-templates select="@*" />
    
          <!-- Apply templates on all <li>s below <list>s with the same @app and @kind
               as the current one -->
          <xsl:apply-templates select="key('kList', concat(../@app, '+', @kind))/li">
            <xsl:sort select="@priority" order="ascending" data-type="number"/>
          </xsl:apply-templates>
        </list>
      </xsl:template>
    </xsl:stylesheet>
    

    A technique to note here is having a key on items based on a pair of values instead of just a single value, and using that to both find distinct instances based on a pair of values, and then finding all instances with the same pair of values.

    When this is run on your sample input, it produces the requested output:

    <mergedHistory>
      <changes app="LibraryA">
        <list kind="New">
          <li priority="1">New feature about Y</li>
          <li priority="2">Faster code for big matrices</li>
          <li priority="4">Added feature about X</li>
        </list>
        <list kind="Enhancement">
          <li priority="1">Fixed integer addition</li>
          <li priority="2">Fixed bug 63451</li>
        </list>
      </changes>
      <changes app="Driver">
        <list kind="New">
          <li priority="0">Compatibility with hardware Z</li>
          <li priority="3">Supporting new CPU models</li>
          <li priority="4">Cross-platform-ness</li>
        </list>
      </changes>
      <changes app="LibraryVector">
        <list kind="Enhancement">
          <li priority="5">Fixed bug 59382</li>
        </list>
      </changes>
    </mergedHistory>