Search code examples
xmlxsltcomposite-keyxslt-grouping

xslt grouping by every attribute


I have multiple types of xml messages I need to "compact" by grouping multiple nodes under the same parent (same parent meaning they share the same node name and every attribute declared is also equal). For example:

<TopLevel CodeTL="Something">
    <Ratings>
          <Rating CodeA="ABC" Start="1-1-2012" End="1-2-2012">
              <RatingByNumber Code="X" Rating="10" Number="1">
              <RatingByNumber Code="X" Rating="19" Number="2">
          </Rating>
    </Ratings>
</TopLevel>
    <TopLevel CodeTL="Something">
    <Ratings>
          <Rating CodeA="ABC" Start="1-2-2012" End="1-3-2012">
              <RatingByNumber Code="X" Rating="10" Number="1">
              <RatingByNumber Code="X" Rating="19" Number="2">
          </Rating>
    </Ratings>
</TopLevel>
<TopLevel CodeTL="Something">
    <Ratings>
          <Rating CodeA="XYZ" Start="1-2-2012" End="1-3-2012">
              <RatingByNumber Code="X" Rating="10" Number="1">
              <RatingByNumber Code="X" Rating="19" Number="2">
          </Rating>
    </Ratings>
</TopLevel>
<TopLevel CodeTL="Something">
    <Ratings>
          <Rating CodeA="XYZ" Start="1-2-2012" End="1-3-2012">
              <RatingByNumber Code="X" Rating="30" Number="3">
              <RatingByNumber Code="X" Rating="39" Number="4">
          </Rating>
    </Ratings>
</TopLevel>

Notice how they all share the same CodeTL attribute and the last 2 share the same CodeA,Start and End attributes so what I need is to produce the following output using a xslt

<TopLevel CodeTL="Something">
    <Ratings>
          <Rating CodeA="ABC" Start="1-1-2012" End="1-2-2012">
              <RatingByNumber Code="X" Rating="10" Number="1">
              <RatingByNumber Code="X" Rating="19" Number="2">
          </Rating>
          <Rating CodeA="ABC" Start="1-2-2012" End="1-3-2012">
              <RatingByNumber Code="X" Rating="10" Number="1">
              <RatingByNumber Code="X" Rating="19" Number="2">
          </Rating>
          <Rating CodeA="XYZ" Start="1-2-2012" End="1-3-2012">
              <RatingByNumber Code="X" Rating="10" Number="1">
              <RatingByNumber Code="X" Rating="19" Number="2">
              <RatingByNumber Code="X" Rating="30" Number="3">
              <RatingByNumber Code="X" Rating="39" Number="4">
          </Rating>
    </Ratings>
</TopLevel>

which is much cleaner and, depending on the application consuming it, it might save processing time and saves space.

The problem I'm having is that I have different types of xml messages with different node names and attributes (and number of attributes) but they all share the same structure I'm showing here. It would be great a generic way to handle all of them but I would be grateful for a XSLT to transform the example I provided so I can create custom code for every xml message I need to send out.


Solution

  • This generic XSLT 2.0 transformation:

    <xsl:stylesheet version="2.0"
     xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
     xmlns:xs="http://www.w3.org/2001/XMLSchema"
     xmlns:my="my:my" exclude-result-prefixes="xs my">
     <xsl:output omit-xml-declaration="yes" indent="yes"/>
    
     <xsl:template match="/*">
         <t>
           <xsl:sequence select="my:grouping(*)"/>
         </t>
     </xsl:template>
    
     <xsl:function name="my:grouping" as="node()*">
       <xsl:param name="pElems" as="element()*"/>
    
       <xsl:if test="$pElems">
           <xsl:for-each-group select="$pElems" group-by="my:signature(.)">
             <xsl:copy>
              <xsl:copy-of select="@*"/>
    
                <xsl:sequence select="my:grouping(current-group()/*)"/>
             </xsl:copy>
           </xsl:for-each-group>
       </xsl:if>
     </xsl:function>
    
     <xsl:function name="my:signature" as="xs:string">
      <xsl:param name="pElem" as="element()"/>
    
      <xsl:variable name="vsignAttribs" as="xs:string*">
          <xsl:for-each select="$pElem/@*">
           <xsl:sort select="name()"/>
    
           <xsl:value-of select="concat(name(), '=', .,'|')"/>
          </xsl:for-each>
      </xsl:variable>
    
      <xsl:sequence select=
      "concat(name($pElem), '|', string-join($vsignAttribs, ''))"/>
     </xsl:function>
    </xsl:stylesheet>
    

    when applied on the provided XML (wrapped into a single top element to become well-formed XML document):

    <t>
        <TopLevel CodeTL="Something">
            <Ratings>
                  <Rating CodeA="ABC" Start="1-1-2012" End="1-2-2012">
                      <RatingByNumber Code="X" Rating="10" Number="1"/>
                      <RatingByNumber Code="X" Rating="19" Number="2"/>
                  </Rating>
            </Ratings>
        </TopLevel>
            <TopLevel CodeTL="Something">
            <Ratings>
                  <Rating CodeA="ABC" Start="1-2-2012" End="1-3-2012">
                      <RatingByNumber Code="X" Rating="10" Number="1"/>
                      <RatingByNumber Code="X" Rating="19" Number="2"/>
                  </Rating>
            </Ratings>
        </TopLevel>
        <TopLevel CodeTL="Something">
            <Ratings>
                  <Rating CodeA="XYZ" Start="1-2-2012" End="1-3-2012">
                      <RatingByNumber Code="X" Rating="10" Number="1"/>
                      <RatingByNumber Code="X" Rating="19" Number="2"/>
                  </Rating>
            </Ratings>
        </TopLevel>
        <TopLevel CodeTL="Something">
            <Ratings>
                  <Rating CodeA="XYZ" Start="1-2-2012" End="1-3-2012">
                      <RatingByNumber Code="X" Rating="30" Number="3"/>
                      <RatingByNumber Code="X" Rating="39" Number="4"/>
                  </Rating>
            </Ratings>
        </TopLevel>
    </t>
    

    produces the wanted, correct result:

    <t>
       <TopLevel CodeTL="Something">
          <Ratings>
             <Rating CodeA="ABC" Start="1-1-2012" End="1-2-2012">
                <RatingByNumber Code="X" Rating="10" Number="1"/>
                <RatingByNumber Code="X" Rating="19" Number="2"/>
             </Rating>
             <Rating CodeA="ABC" Start="1-2-2012" End="1-3-2012">
                <RatingByNumber Code="X" Rating="10" Number="1"/>
                <RatingByNumber Code="X" Rating="19" Number="2"/>
             </Rating>
             <Rating CodeA="XYZ" Start="1-2-2012" End="1-3-2012">
                <RatingByNumber Code="X" Rating="10" Number="1"/>
                <RatingByNumber Code="X" Rating="19" Number="2"/>
                <RatingByNumber Code="X" Rating="30" Number="3"/>
                <RatingByNumber Code="X" Rating="39" Number="4"/>
             </Rating>
          </Ratings>
       </TopLevel>
    </t>
    

    Explanation:

    1. The performed grouping is implemented in the function my:grouping() and is recursive.

    2. The top element is single at its level and doesn't need any other grouping than just shallow copy of itself. Then inside the body of this shallow copy the grouping of the lower levels is performed by the function my:grouping().

    3. The function my:grouping() has a single argument which is all the children elements of a all elements in a group at the immediate upper level. It returns all groups at the current level.

    4. The sequence of elements passed as argument to the function, is grouped based on their signature -- the concatenation of the name of the element with all name-value pairs of its attributes and their corresponding values, and these are separated using appropriate delimiters. The signature of an element is produced by the function my:signature() .


    II. Generic XSLT 1.0 solution:

    <xsl:stylesheet version="1.0"
             xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
             xmlns:ext="http://exslt.org/common"
             xmlns:my="my:my" exclude-result-prefixes="my ext">
             <xsl:output omit-xml-declaration="yes" indent="yes"/>
             <xsl:strip-space elements="*"/>
    
             <xsl:variable name="vrtfPass1">
              <xsl:apply-templates select="/*"/>
             </xsl:variable>
    
             <xsl:variable name="vPass1" select="ext:node-set($vrtfPass1)"/>
    
             <xsl:template match="/">
              <xsl:apply-templates select="$vPass1/*" mode="pass2"/>
             </xsl:template>
    
             <xsl:template match="/*" mode="pass2">
                 <xsl:copy>
                   <xsl:call-template name="my:grouping">
                    <xsl:with-param name="pElems" select="*"/>
                   </xsl:call-template>
                 </xsl:copy>
             </xsl:template>
    
             <xsl:template name="my:grouping">
               <xsl:param name="pElems" select="/.."/>
    
               <xsl:if test="$pElems">
                 <xsl:for-each select="$pElems">
                  <xsl:variable name="vPos" select="position()"/>
    
                  <xsl:if test=
                   "not(current()/@my:sign
                       = $pElems[not(position() >= $vPos)]/@my:sign
                       )">
    
                     <xsl:element name="{name()}">
                      <xsl:copy-of select="namespace::*[not(. = 'my:my')]"/>
                      <xsl:copy-of select="@*[not(name()='my:sign')]"/>
                       <xsl:call-template name="my:grouping">
                        <xsl:with-param name="pElems" select=
                        "$pElems[@my:sign = current()/@my:sign]/*"/>
                       </xsl:call-template>
                     </xsl:element>
                   </xsl:if>
    
                 </xsl:for-each>
               </xsl:if>
             </xsl:template>
    
         <xsl:template match="/*">
                 <xsl:copy>
                   <xsl:apply-templates/>
                 </xsl:copy>
         </xsl:template>
    
         <xsl:template match="*/*">
          <xsl:variable name="vSignature">
           <xsl:call-template name="signature"/>
          </xsl:variable>
          <xsl:copy>
           <xsl:copy-of select="@*"/>
           <xsl:attribute name="my:sign">
            <xsl:value-of select="$vSignature"/>
           </xsl:attribute>
    
           <xsl:apply-templates/>
          </xsl:copy>
         </xsl:template>
    
         <xsl:template name="signature">
           <xsl:variable name="vsignAttribs">
             <xsl:for-each select="@*">
              <xsl:sort select="name()"/>
    
                    <xsl:value-of select="concat(name(), '=', .,'|')"/>
                 </xsl:for-each>
            </xsl:variable>
    
            <xsl:value-of select=
              "concat(name(), '|', $vsignAttribs)"/>
         </xsl:template>
    </xsl:stylesheet>
    

    When this transformation is applied on the same XML document (above), again the same correct result is produced:

    <t>
       <TopLevel>
          <Ratings>
             <Rating CodeA="ABC" Start="1-1-2012" End="1-2-2012">
                <RatingByNumber Code="X" Rating="10" Number="1"/>
                <RatingByNumber Code="X" Rating="19" Number="2"/>
             </Rating>
             <Rating CodeA="ABC" Start="1-2-2012" End="1-3-2012">
                <RatingByNumber Code="X" Rating="10" Number="1"/>
                <RatingByNumber Code="X" Rating="19" Number="2"/>
             </Rating>
             <Rating CodeA="XYZ" Start="1-2-2012" End="1-3-2012">
                <RatingByNumber Code="X" Rating="10" Number="1"/>
                <RatingByNumber Code="X" Rating="19" Number="2"/>
                <RatingByNumber Code="X" Rating="30" Number="3"/>
                <RatingByNumber Code="X" Rating="39" Number="4"/>
             </Rating>
          </Ratings>
       </TopLevel>
    </t>
    

    Explanation:

    1. This is a two-pass transformation.

    2. In the first pass for every element a signature is calculated and it becomes the valye of a new attribute my:sign.

    3. The same recursive grouping algorithm is used as with the XSLT 2.0 solution.