Search code examples
xmlxsltxslt-1.0xslt-groupingmuenchian-grouping

XSLT grouping on multiple keys using Muenchian method


This is the input file.

All these blocks are wrapped in a <allocfile> tag which is not appearing, dunno why? And all these blocks are wrapped in a top level element <xml>.

<XML>
  <AllocFile>
    <alc>1</alc>
    <No>11/10</No>
    <DT>20090401</DT> 
    <G_H>147</G_H>
    <FUN>125487</FUN>
    <oH>11</oH>
    <y>9</y>
    <AMOUNT>8000000</AMOUNT>
    <Code>033195</Code>
    <hd1>1234</hd1>
  </AllocFile>
  <AllocFile>
    <alc>2</alc>
    <No>14/10</No>
    <DT>20090401</DT>
    <G_H>147</G_H>
    <FUN>125487</FUN>
    <oH>11</oH>
    <y>9</y>
    <AMOUNT>8400000</AMOUNT>
    <Code>033195</Code>
    <hd1>1234</hd1>
  </AllocFile>
  <AllocFile>
    <alc>3</alc>
    <No>74/10</No>
    <DT>20090401</DT>
    <G_H>147</G_H>
    <FUN>125487</FUN>
    <oH>11</oH>
    <y>9</y>
    <AMOUNT>8740000</AMOUNT>
    <Code>033195</Code>
    <hd1>1234</hd1>
  </AllocFile>
  <AllocFile>
    <alc>2</alc>
    <No>74/10</No>
    <DT>20090401</DT>
    <G_H>117</G_H>
    <FUN>125487</FUN>
    <oH>19</oH>
    <y>9</y>
    <AMOUNT>74512</AMOUNT>
    <Code>033118</Code>
    <hd1>1234</hd1>
  </AllocFile>
  <AllocFile>
    <alc>3</alc>
    <No>14/10</No>
    <DT>20090401</DT>
    <G_H>117</G_H>
    <FUN>125487</FUN>
    <oH>19</oH>
    <y>9</y>
    <AMOUNT>986541</AMOUNT>
    <Code>033147</Code>
    <hd1>1234</hd1>
  </AllocFile> 
</XML>

The output is

<Header1>
  <Hd1>1234</Hd1>
  <CodeHeader>
    <Code>033195</Code>
    <Header2>
      <G_H>147</G_H>
      <FUN>125487</FUN>
      <oH>11</oH>
      <y>9</y>
      <allocheader>
        <alc>1</alc>
        <No>11/10</No>
        <DT>20090401</DT>
        <AMOUNT>8000000</AMOUNT>
      </allocheader>
      <allocheader>
        <alc>2</alc>
        <No>14/10</No>
        <DT>20090401</DT>
        <AMOUNT>8400000</AMOUNT>
      </allocheader>
      <allocheader>
        <alc>3</alc>
        <No>74/10</No>
        <DT>20090401</DT>
        <AMOUNT>8740000</AMOUNT>
      </allocheader>
    </Header2>
  </CodeHeader>
  <CodeHeader>
        <Code>033118</Code>
        <Header2>
      <G_H>117</G_H>
      <FUN>125487</FUN>
         <oH>19</oH>
            <y>9</y>
             <allocheader>
             <alc>2</alc>
             <No>74/10</No>
             <DT>20090401</DT>
             <AMOUNT>74512</AMOUNT>
           </allocheader>
       </Header2>
    </codeHeader>
   <CodeHeader>
        <Code>033147</Code>
           <Header2>
          <G_H>117</G_H>
          <FUN>125487</FUN>
          <oH>19</oH>
          <y>9</y>
         <allocheader>
           <alc>3</alc>
            <No>14/10</No>
            <DT>20090401</DT>
            <AMOUNT>986541</AMOUNT>
          </allocheader>
         </Header2>
      </CodeHeader>
</Header1>

The input file needs to be sorted and grouped on the basis of multiple keys. I proceeded using the concat function and the Muenchian method but didn't much help from the web. I am using XSLT 1.0.

Rules for Grouping

  • All the nodes in the file will have <hd1> with values 1234.. this becomes the first group by key and appears in the output as <Header1>

    • the second key for grouping is the node code . nodes having same value get grouped together. appears as. code header
  • The second key is the group of nodes G_H, FUN, oH, y. If all these have the same values for nodes, they get grouped together. It appears in the output as <Header2>

  • No grouping happens on the nodes <alc>, <No>, <DT>, <AMOUNT>. They have distinct values within each group.


Solution

  • If the hd1 element is always '1234' then you are not really grouping by them, but if you were you would define a simple key like so

    <xsl:key name="header1" match="AllocFile" use="hd1" />
    

    For the second key, you would need to take account of the Code element

    <xsl:key name="header2" match="AllocFile" use="concat(hd1, '|', Code)" />
    

    And then for the last key, you would define a more complicated key to cope with all the elements

    <xsl:key name="header3" 
       match="AllocFile" 
       use="concat(hd1 '|', Code, '|', G_H, '|', FUN, '|', oH, '|', y)" />
    

    Do note the use of the 'pipe' character as the delimiter. It is important to pick a delimited that would never occur in any of the selected elements.

    Then, to look for the distinct header1 elements, you would look for the elements which appear first in the header1 key

    <xsl:apply-templates 
       select="AllocFile[generate-id() = generate-id(key('header1', hd1)[1])]" 
       mode="header1" />
    

    To find the distinct Code elements within each header1 element, you would do the following

    <xsl:apply-templates 
       select="key('header1', hd1)
         [generate-id() = generate-id(key('header2', concat(hd1, '|', Code))[1])]" 
       mode="header2" /> 
    

    Finally, within each code group, to find the distinct 'header3' elements, you would look for the first elements within the third key

    <xsl:apply-templates 
     select="key('header2', concat(hd1, '|', Code))
        [generate-id() = 
         generate-id(key('header3', concat(hd1, '|', Code, '|', G_H, '|', FUN, '|', oH, '|', y))[1])]" 
     mode="header3" /> 
    

    Here is the full XSLT

    <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
       <xsl:output method="xml" indent="yes"/>
    
       <xsl:key name="header1" match="AllocFile" use="hd1"/>
       <xsl:key name="header2" match="AllocFile" use="concat(hd1, '|', Code)"/>
       <xsl:key name="header3" match="AllocFile" use="concat(hd1, '|', Code, '|', G_H, '|', FUN, '|', oH, '|', y)"/>
    
       <xsl:template match="/XML">
          <xsl:apply-templates select="AllocFile[generate-id() = generate-id(key('header1', hd1)[1])]" mode="header1"/>
       </xsl:template>
    
       <xsl:template match="AllocFile" mode="header1">
          <Header1>
             <Hd1>
                <xsl:value-of select="hd1"/>
             </Hd1>
             <xsl:apply-templates select="key('header1', hd1)[generate-id() = generate-id(key('header2', concat(hd1, '|', Code))[1])]" mode="header2"/>
          </Header1>
       </xsl:template>
    
       <xsl:template match="AllocFile" mode="header2">
          <CodeHeader>
             <xsl:copy-of select="Code"/>
             <xsl:apply-templates select="key('header2', concat(hd1, '|', Code))[generate-id() = generate-id(key('header3', concat(hd1, '|', Code, '|', G_H, '|', FUN, '|', oH, '|', y))[1])]" mode="header3"/>
          </CodeHeader>
       </xsl:template>
    
       <xsl:template match="AllocFile" mode="header3">
          <Header2>
             <xsl:copy-of select="G_H|FUN|oH|y"/>
             <xsl:apply-templates select="key('header3', concat(hd1, '|', Code, '|', G_H, '|', FUN, '|', oH, '|', y))"/>
          </Header2>
       </xsl:template>
    
       <xsl:template match="AllocFile">
          <allocheader>
             <xsl:copy-of select="alc|No|DT|AMOUNT"/>
          </allocheader>
       </xsl:template>
    </xsl:stylesheet>
    

    Do note the use of the mode attribute on the template matching to distinguish between the multiple templates all matching the AllocFile elements.

    When applied to your sample XML, the following is output

    <Header1>
       <Hd1>1234</Hd1>
       <CodeHeader>
          <Code>033195</Code>
          <Header2>
             <G_H>147</G_H>
             <FUN>125487</FUN>
             <oH>11</oH>
             <y>9</y>
             <allocheader>
                <alc>1</alc>
                <No>11/10</No>
                <DT>20090401</DT>
                <AMOUNT>8000000</AMOUNT>
             </allocheader>
             <allocheader>
                <alc>2</alc>
                <No>14/10</No>
                <DT>20090401</DT>
                <AMOUNT>8400000</AMOUNT>
             </allocheader>
             <allocheader>
                <alc>3</alc>
                <No>74/10</No>
                <DT>20090401</DT>
                <AMOUNT>8740000</AMOUNT>
             </allocheader>
          </Header2>
       </CodeHeader>
       <CodeHeader>
          <Code>033118</Code>
          <Header2>
             <G_H>117</G_H>
             <FUN>125487</FUN>
             <oH>19</oH>
             <y>9</y>
             <allocheader>
                <alc>2</alc>
                <No>74/10</No>
                <DT>20090401</DT>
                <AMOUNT>74512</AMOUNT>
             </allocheader>
          </Header2>
       </CodeHeader>
       <CodeHeader>
          <Code>033147</Code>
          <Header2>
             <G_H>117</G_H>
             <FUN>125487</FUN>
             <oH>19</oH>
             <y>9</y>
             <allocheader>
                <alc>3</alc>
                <No>14/10</No>
                <DT>20090401</DT>
                <AMOUNT>986541</AMOUNT>
             </allocheader>
          </Header2>
       </CodeHeader>
    </Header1>
    

    If you did have different hd1 elements, other than '1234' you would end up with multiple Header1 elements, and so your output would not be well-formed XML. It would be simple enough to wrap them in a root element though by modified the initial template matching the document element.

    <xsl:template match="/XML">
       <Root>
          <xsl:apply-templates select="AllocFile[generate-id() = generate-id(key('header1', hd1)[1])]" mode="header1" />
       </Root>
    </xsl:template>