Search code examples
xsltexsltmuenchian-grouping

XSLT: output into multiple xml files based on grouping


Let's assume, you have the xml below. The goal is to group by FirstName and export the Person into different xml files. Each output xml files should only contain up to X different FirstName.

Below is an example of the desired transformation with X = 3

XML input:

<People>
    <Person>             
        <FirstName>John</FirstName>             
        <LastName>Doe</LastName> 
    </Person> 
    <Person>             
        <FirstName>Jack</FirstName>             
        <LastName>White</LastName> 
    </Person>
    <Person>             
        <FirstName>Mark</FirstName>             
        <LastName>Wall</LastName> 
    </Person>
    <Person>             
        <FirstName>John</FirstName>             
        <LastName>Ding</LastName> 
    </Person> 
    <Person>             
        <FirstName>Cyrus</FirstName>             
        <LastName>Ding</LastName> 
    </Person>  
    <Person>             
        <FirstName>Megan</FirstName>             
        <LastName>Boing</LastName> 
    </Person>
</People>          

XML output 1 with 3 different FirstName

<People>
    <Person>             
        <FirstName>John</FirstName>             
        <LastName>Doe</LastName> 
    </Person> 
    <Person>             
        <FirstName>John</FirstName>             
        <LastName>Ding</LastName> 
    </Person>
    <Person>             
        <FirstName>Jack</FirstName>             
        <LastName>White</LastName> 
    </Person>
    <Person>             
        <FirstName>Mark</FirstName>             
        <LastName>Wall</LastName> 
    </Person>  
</People> 

XML output 2 with the 2 remaining FirstName

<People>
    <Person>             
        <FirstName>Cyrus</FirstName>             
        <LastName>Ding</LastName> 
    </Person>  
    <Person>             
        <FirstName>Megan</FirstName>             
        <LastName>Boing</LastName> 
    </Person>
</People> 

It seems to me that the muenchian grouping can be used along with the to produce multiple output files. However, the core question is where we can set a threshold in number of person before exporting to a new file?


Solution

  • Here is an example of doing it in two steps with XSLT 2.0:

    <xsl:stylesheet
      xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
      xmlns:xs="http://www.w3.org/2001/XMLSchema"
      exclude-result-prefixes="xs"
      version="2.0">
    
      <xsl:param name="n" as="xs:integer" select="3"/>
    
      <xsl:output method="xml" indent="yes"/>
    
      <xsl:template match="People">
        <xsl:variable name="groups" as="element(group)*">
          <xsl:for-each-group select="Person" group-by="FirstName">
            <group>
              <xsl:copy-of select="current-group()"/>
            </group>
          </xsl:for-each-group>
        </xsl:variable>
        <xsl:for-each-group select="$groups" group-by="(position() - 1) idiv $n">
          <xsl:result-document href="group{position()}.xml">
            <People>
              <xsl:copy-of select="current-group()"/>
            </People>
          </xsl:result-document>
        </xsl:for-each-group>
      </xsl:template>
    
    </xsl:stylesheet>
    

    I might try to convert to XSLT 1.0 and EXSLT later.

    [edit] Here is an attempt to translate into XSLT 1.0 and EXSLT:

    <xsl:stylesheet
      xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
      xmlns:exsl="http://exslt.org/common"
      extension-element-prefixes="exsl"
      exclude-result-prefixes="exsl"
      version="1.0">
    
      <xsl:param name="n" select="3"/>
    
      <xsl:output method="xml" indent="yes"/>
    
      <xsl:key name="person-by-firstname" 
               match="Person"
               use="FirstName"/>
    
      <xsl:template match="People">
        <xsl:variable name="groups">
          <xsl:for-each select="Person[generate-id() = generate-id(key('person-by-firstname', FirstName)[1])]">
            <group>
              <xsl:copy-of select="key('person-by-firstname', FirstName)"/>
            </group>
          </xsl:for-each>
        </xsl:variable>
        <xsl:for-each select="exsl:node-set($groups)/group[(position() - 1) mod $n = 0]">
          <exsl:document href="groupTest{position()}.xml">
            <People>
              <xsl:copy-of select="Person | following-sibling::group[position() &lt; $n]/Person"/>
            </People>
          </exsl:document>
        </xsl:for-each>
      </xsl:template>
    
    </xsl:stylesheet>