Search code examples
xmlxsltxslt-2.0xslt-grouping

How to retrieve an xml document and group nodes based on attribute value using XSLT


So lets assume that my transaction body is completely empty when I start the transformation.

I have an xml file with some data like this:

<?xml version="1.0" encoding="UTF-8"?>
<codes>
    <code sourceCode="10" targetCode="06" area="MAIL"/>
    <code sourceCode="11" targetCode="1" area="PROFESSION"/>
    <code sourceCode="11" targetCode="11" area="HOME"/>
    <code sourceCode="12" targetCode="13" area="HOME"/>
    <code sourceCode="12" targetCode="3" area="HOME"/>
    <code sourceCode="13" targetCode="10" area="PROFESSION"/>
    <code sourceCode="14" targetCode="01" area="WORK"/>
    <code sourceCode="14" targetCode="05" area="MAIL"/>
</codes>

I need the result to be something like this:

<codes>
    <code>
        <sourceCode>10</sourceCode>
        <areas>
            <area>MAIL</area>
        </areas>
    </code>
    <code>
        <sourceCode>11</sourceCode>
        <areas>
            <area>PROFESSION</area>
            <area>HOME</area>
        </areas>
    </code>
    <!-- This code has two different targetCodes with the same area 
        but in the result it should not have the same area twice in the areas list -->
    <code>
        <sourceCode>12</sourceCode>
        <areas>
            <area>HOME</area>
        </areas>
    </code>
    <code>
        <sourceCode>13</sourceCode>
        <areas>
            <area>PROFESSION</area>
        </areas>
    </code>
    <code>
        <sourceCode>14</sourceCode>
        <areas>
            <area>WORK</area>
            <area>MAIL</area>
        </areas>
    </code>
</codes>

There are two challenges here. First one is that the original document is not the body of the transaction but you have to retrieve it inside the xsl. Second one is the actual transformation and how to achieve this grouping by sourceCode attribute while eliminating the targetCode attribute entirely and not having duplicate areas.

At the start of the XSL file this is how I retrieve the XML document:

<xsl:variable name="lookupDoc" select="document('../common/lookup/mapping.xml')"/>

EDIT: Finally I would also like to filter out area "MAIL" completely. So in the final xml the first code element would be eliminated while the last one would just have one area ("WORK").


Solution

  • Use a template like

      <xsl:template match="codes">
        <xsl:copy>
          <xsl:for-each-group select="code" group-by="@sourceCode">
            <xsl:copy>
              <sourceCode>
                <xsl:value-of select="current-grouping-key()"/>
              </sourceCode>
              <areas>
                <xsl:for-each select="distinct-values(current-group()/@area)">
                  <area>
                    <xsl:value-of select="."/>
                  </area>
                </xsl:for-each>
              </areas>
            </xsl:copy>
          </xsl:for-each-group>
        </xsl:copy>
      </xsl:template>
    

    and then push <xsl:apply-templates select="$lookupDoc/node()"/> through that template.