Search code examples
xmlxsltxslt-2.0xslt-grouping

Advanced grouping by sequence of child nodes


I'm trying to loop over each "destination" grouped by their trips' ids. That is, all destinations containing the exact same trips (determined by their id) should be one iteration.

In the example XML below, the two first destinations each contain four trips. The four trips of destination "1" / "Bahamas" is the same four trips as destination "2" / "Hawaii". The third destination contains another set of trips. I'd like to make two iterations here: the first iteration containing destination "1" and "2" (as all trips in destination "1" are in destination "2" and the other way around), and the second iteration containing destination "3".

If destination "2" did not have the trip with id = "4", I would expect it to make three iterations: one for each destinations as none of them contain the exact same trips.

The data is structured more or less like this, but in another context and more data. Sooo, don't put too much attention to the structure of the data. Changing the structure is unfortunately not an option as I do not control the data.

<destinations>
  <destination>
    <key>1</key>
    <location>Bahamas</location>
    <tags>
      <tag>summer</tag>
      <tag>beach</tag>
      <tag>surfing</tag>
    </tags>
    <easy-trips>
      <trip>
        <id>1</id>
      </trip>
      <trip>
        <id>2</id>
      </trip>
    </easy-trips>
    <experienced-trips>
      <trip>
        <id>3</id>
      </trip>
      <trip>
        <id>4</id>
      </trip>
    <experienced-trips>
  </destination>
  <destination>
    <key>2</key>
    <location>Hawaii</location>
    <tags>
      <tag>summer</tag>
      <tag>beach</tag>
      <tag>surfing</tag>
    </tags>
    <easy-trips>
      <trip>
        <id>1</id>
      </trip>
      <trip>
        <id>2</id>
      </trip>
    </easy-trips>
    <experienced-trips>
      <trip>
        <id>3</id>
      </trip>
      <trip>
        <id>4</id>
      </trip>
    <experienced-trips>
  </destination>
  <destination>
    <key>3</key>
    <location>Rio</location>
    <tags>
      <tag>big city life</tag>
      <tag>samba</tag>
    </tags>
    <easy-trips>
      <trip>
        <id>8</id>
      </trip>
      <trip>
        <id>9</id>
      </trip>
    </easy-trips>
    <experienced-trips>
      <trip>
        <id>10</id>
      </trip>
      <trip>
        <id>11</id>
      </trip>
    <experienced-trips>
  </destination>
</destinations>

What I've tried so far

<xsl:for-each-group select="/destinations/destination" group-by="current()//id">

Showed good promise in the beginning when I only had one trip in each destination. When adding more trips, this will not work. In the example above it would loop the first two destinations as a group four times (one for each id they had in common), and then go on to destination "3" / "Rio".

Did also look into the Muenchian Method, but didn't seem to get me anyway further than the "for-each-group" attempt.

Got a solution?

Any ideas on how to solve this is highly appreciated! In advance, thank you for your time and help! :-)

--- EDIT ---

Added more data to example XML (tags). As a correlation of the data structure, destinations containing the exact same trips will always have the same tags. Like destination "1" and "2" in the example.

Desired output

<tagCollections>
  <tagCollection>
    <tag>summer</tag>
    <tag>beach</tag>
    <tag>surfing</tag>
  </tagCollection>
  <tagCollection>
    <tag>big city life</tag>
    <tag>samba</tag>
  </tagCollection>
</tagCollections>

Solution

  • I think you want to compute a composite grouping key in Xslt 2.0 with code like <xsl:for-each-group select="/destinations/destination" group-by="string-join(.//id, '+')">. The plus symbol I have used is just an example, use any character never used in an id value.

    And the current suggestion assumes the ids are ordered, if not, you would need to write a function doing the ordering first that you would then call in the group-by.

    I add an XSLT 3.0 sample (can be run with the commercial versions of Saxon 9) that shows how to use a function to sort a sequence and how to use that within the group-by:

    <xsl:stylesheet version="3.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
      xmlns:xs="http://www.w3.org/2001/XMLSchema"
      xmlns:mf="http://example.com/mf"
      exclude-result-prefixes="xs mf">
    
    <xsl:output indent="yes"/>
    
    <xsl:function name="mf:sort">
      <xsl:param name="input-sequence" as="item()*"/>
      <xsl:perform-sort select="$input-sequence">
        <xsl:sort select="."/>
      </xsl:perform-sort>
    </xsl:function>
    
    <xsl:template match="destinations">
      <tagCollections>
        <xsl:for-each-group select="/destinations/destination" 
            group-by="string-join(mf:sort(.//id/xs:integer(.))!string(), '+')">
          <tagCollection>
            <xsl:copy-of select="tags/tag"/>
          </tagCollection>
        </xsl:for-each-group>
      </tagCollections>
    </xsl:template>
    
    </xsl:stylesheet>
    

    When applied to the input sample

    <destinations>
      <destination>
        <key>1</key>
        <location>Bahamas</location>
        <tags>
          <tag>summer</tag>
          <tag>beach</tag>
          <tag>surfing</tag>
        </tags>
        <easy-trips>
          <trip>
            <id>1</id>
          </trip>
          <trip>
            <id>2</id>
          </trip>
        </easy-trips>
        <experienced-trips>
          <trip>
            <id>3</id>
          </trip>
          <trip>
            <id>4</id>
          </trip>
        </experienced-trips>
      </destination>
      <destination>
        <key>2</key>
        <location>Hawaii</location>
        <tags>
          <tag>summer</tag>
          <tag>beach</tag>
          <tag>surfing</tag>
        </tags>
        <easy-trips>
          <trip>
            <id>4</id>
          </trip>
          <trip>
            <id>3</id>
          </trip>
        </easy-trips>
        <experienced-trips>
          <trip>
            <id>2</id>
          </trip>
          <trip>
            <id>1</id>
          </trip>
        </experienced-trips>
      </destination>
      <destination>
        <key>3</key>
        <location>Rio</location>
        <tags>
          <tag>big city life</tag>
          <tag>samba</tag>
        </tags>
        <easy-trips>
          <trip>
            <id>8</id>
          </trip>
          <trip>
            <id>9</id>
          </trip>
        </easy-trips>
        <experienced-trips>
          <trip>
            <id>10</id>
          </trip>
          <trip>
            <id>11</id>
          </trip>
        </experienced-trips>
      </destination>
    </destinations>
    

    I get the output

    <tagCollections>
       <tagCollection>
          <tag>summer</tag>
          <tag>beach</tag>
          <tag>surfing</tag>
       </tagCollection>
       <tagCollection>
          <tag>big city life</tag>
          <tag>samba</tag>
       </tagCollection>
    </tagCollections>
    

    Using XSLT 2.0 it is of course also possible, the expression is a bit longer however:

    <xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
      xmlns:xs="http://www.w3.org/2001/XMLSchema"
      xmlns:mf="http://example.com/mf"
      exclude-result-prefixes="xs mf">
    
    <xsl:output indent="yes"/>
    
    <xsl:function name="mf:sort">
      <xsl:param name="input-sequence" as="item()*"/>
      <xsl:perform-sort select="$input-sequence">
        <xsl:sort select="."/>
      </xsl:perform-sort>
    </xsl:function>
    
    <xsl:template match="destinations">
      <tagCollections>
        <xsl:for-each-group select="/destinations/destination" 
            group-by="string-join(for $n in mf:sort(.//id/xs:integer(.)) return string($n), '+')">
          <tagCollection>
            <xsl:copy-of select="tags/tag"/>
          </tagCollection>
        </xsl:for-each-group>
      </tagCollections>
    </xsl:template>
    
    </xsl:stylesheet>