Search code examples
xmlsortingxslt

sort list and eliminiate duplicates in XML file with xslt NEW


I already asked this question and I took a wrong example list which led to answers which weren't useful. I then thought that I have found a solution but it leads to wrong results. So let me ask this again.

Start XML is a list which contains many occurrences of different elements which have different attributes which contain different values. Example XML:

<attributes>
        <para role="tocmain1"/>
        <para role="tocmain1"/>
        <other style="fix"/>
        <other style="fix1"/>
        <para role="tocmain2"/>
        <para role="tocmain2"/>
        <para role="tocmain2"/>
        <para role="tocmain3"/>
        <para role="tocmain3"/>
        <para language="de"/>
        <para language="de"/>
        <para role="tocmain3"/>
</attributes>

Result should be a list which contains only one occurrence of each Element + attribute + value combination and which should be ordered alphabetically in this order:

  1. alphabetical order of elements
  2. alphabetical order of attributes
  3. alphabetical order of values.

Example result:

<attributes>
     <other style="fix"/>
     <other style="fix1"/>
     <para language="de"/>
     <para role="tocmain1"/>
     <para role="tocmain2"/>
     <para role="tocmain3"/>    
</attributes>

Right now I'm using two xlst which are executed consecutively and the problem is that the resulting list is incomplete: some combinations of element + attribute + value are missing. Problem is located in the first template because I group by attribute values and I take only the first occurrence. It can be that the same attribute value is used with different attributes. In these cases the second occurrence is missing. Any possibility to group based on a combination of attribute + value?

1. XSLT (grouping and eliminating duplicate items):

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output indent="yes"/>
  <xsl:strip-space elements="*"/>       
  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>      
  <xsl:template match="/*">
    <xsl:copy>
      <xsl:apply-templates select="@*"/>         
      <xsl:for-each-group select="*" group-by="@*">
        <xsl:sort select="@*"/> 
        <xsl:apply-templates select="current-group()[1]"/>          
      </xsl:for-each-group>                     
    </xsl:copy>
  </xsl:template>       
</xsl:stylesheet>

2. XSLT (sorting alphabetically):

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    version="2.0">
    <xsl:output indent="yes"/>
    <xsl:strip-space elements="*"/>
    <xsl:template match="/">
        <attributes>
            <xsl:for-each select="attributes/node()">                 
                <xsl:sort select="name()" order="ascending"/> 
                <xsl:sort select="name(@*)" order="ascending"/>                   
                <xsl:sort select="@*" order="ascending"/>                  
                <xsl:copy-of select="."/>                  
            </xsl:for-each>   
        </attributes>
    </xsl:template>           
</xsl:stylesheet>

Any help is very welcome and sorry for misleading question in the first try!


Solution

  • Assuming XSLT 3.0 as supported by Saxon 9.7 PE or EE or AltovaXML 2017 you can simply use a composite key:

    <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
        xmlns:xs="http://www.w3.org/2001/XMLSchema"
        xmlns:math="http://www.w3.org/2005/xpath-functions/math"
        exclude-result-prefixes="xs math"
        version="3.0">
    
        <xsl:output indent="yes"/>  
    
        <xsl:template match="/*">
            <xsl:copy>
                <xsl:for-each-group select="*" group-by="node-name(), node-name(@*[1]), @*[1]" composite="yes">
                    <xsl:sort select="string(current-grouping-key()[1])"/>
                    <xsl:sort select="string(current-grouping-key()[2])"/>
                    <xsl:copy-of select="."/>
                </xsl:for-each-group>
            </xsl:copy>
        </xsl:template>
    
    </xsl:stylesheet>
    

    With XSLT 2.0 you can use nested for-each-groups

    <xsl:template match="/*">
        <xsl:copy>
            <xsl:for-each-group select="*" group-by="node-name(.)">
                <xsl:sort select="string(current-grouping-key())"/>
                <xsl:for-each-group select="current-group()" group-by="node-name(@*[1])">
                    <xsl:sort select="string(current-grouping-key())"/>                 
                    <xsl:for-each-group select="current-group()" group-by="@*[1]">
                        <xsl:copy-of select="."/>
                    </xsl:for-each-group>
                </xsl:for-each-group>
            </xsl:for-each-group>
        </xsl:copy>
    </xsl:template>
    

    or use a composite key created using string concatenation with e.g. group-by="concat(node-name(), '|', node-name(@*[1]), '|', @*[1])".

    Those suggestions assume the elements can have differents attributes, but each element has only one attribute.