Search code examples
xsltxslt-2.0saxonxslt-groupingexslt

Remove duplicate elements based on attributes and values


There are many questions about how to remove duplicate elements when you can group those elements by a certain attribute or value, however, in my case the attributes are being dynamically generated in the XSLT already and I don't want to have to program in every attribute for every element to use as a grouping key.

How do you remove duplicate elements without knowing in advance their attributes? So far, I've tried using generate-id() on each element and grouping by that, but the problem is generate-id isn't generating the same ID for elements with the same attributes:

<xsl:template match="root">
    <xsl:variable name="tempIds">
        <xsl:for-each select="./*>
            <xsl:copy>
                <xsl:copy-of select="@*"/>
                <xsl:attribute name="tempID">
                    <xsl:value-of select="generate-id(.)"/>
                </xsl:attribute>
                <xsl:copy-of select="node()"/>
            </xsl:copy>
        </xsl:for-each>
    </xsl:variable>
    <xsl:for-each-group select="$tempIds" group-by="@tempID">
        <xsl:sequence select="."/>
    </xsl:for-each-group>
</xsl:template>

Test data:

<root>
    <child1>
        <etc/>
    </child1>
    <dynamicElement1 a="2" b="3"/>
    <dynamicElement2 c="3" d="4"/>
    <dynamicElement2 c="3" d="5"/>
    <dynamicElement1 a="2" b="3"/>
</root>

With the end result being only one of the two dynamicElement1 elements remaining:

<root>
    <child1>
        <etc/>
    </child1>
    <dynamicElement1 a="2" b="3"/>
    <dynamicElement2 c="3" d="4"/>
    <dynamicElement2 c="3" d="5"/>
</root>

Solution

  • In XSLT 3 as shown in https://xsltfiddle.liberty-development.net/pPqsHTi you can use a composite key of all attributes with e.g.

    <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
        version="3.0">
    
      <xsl:mode on-no-match="shallow-copy"/>
    
      <xsl:output indent="yes"/>
    
      <xsl:template match="root">
          <xsl:copy>
              <xsl:for-each-group select="*" composite="yes" group-by="@*">
                  <xsl:sequence select="."/>
              </xsl:for-each-group>
          </xsl:copy>
      </xsl:template>
    
    </xsl:stylesheet>
    

    Note that technically attributes are not ordered so it might be safer to group by a sort of the attributes by node-name() or similar, as done with XSLT 3 without higher-order functions in https://xsltfiddle.liberty-development.net/pPqsHTi/2

    <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
        xmlns:mf="http://example.com/mf"
        version="3.0">
    
      <xsl:mode on-no-match="shallow-copy"/>
    
      <xsl:output indent="yes"/>
    
      <xsl:function name="mf:node-sort" as="node()*">
          <xsl:param name="input-nodes" as="node()*"/>
          <xsl:perform-sort select="$input-nodes">
              <xsl:sort select="namespace-uri()"/>
              <xsl:sort select="local-name()"/>
          </xsl:perform-sort>
      </xsl:function>
    
      <xsl:template match="root">
          <xsl:copy>
              <xsl:for-each-group select="*" composite="yes" group-by="mf:node-sort(@*)">
                  <xsl:sequence select="."/>
              </xsl:for-each-group>
          </xsl:copy>
      </xsl:template>
    
    </xsl:stylesheet>
    

    or as you could do with Saxon EE simply with

    <xsl:template match="root">
        <xsl:copy>
            <xsl:for-each-group select="*" composite="yes" group-by="sort(@*, (), function($att) { namespace-uri($att), local-name($att) })">
                <xsl:sequence select="."/>
            </xsl:for-each-group>
        </xsl:copy>
    </xsl:template>