Search code examples
xmlxsltxslt-2.0

How to get unique structure of elements using xsl:key


Please suggest, how to avoid the duplicate elements list using xsl:key (I got the result from variable method, but it is not a efficient way). Please suggest.

In my input, 'Ref' is the main element, where it is having several descendants. Needs to list only 'Ref' elements where their structure (only elements name, not the content) are unique. If <Ref><a>1</a><b>3</b></Ref> and <Ref><a>1001</a><b>2001</b></Ref>, then only First <Ref> should be displayed. In given input, ignoring 'au' and 'ed' elements as their ancestor.

Input XML:

<article>
<Ref id="ref1">
    <RefText>
        <authors><au><snm>Kishan</snm><fnm>TR</fnm></au><au><snm>Rudramuni</snm><fnm>TP</fnm></au></authors>
        <artTitle>The article1</artTitle><jTitle>Journal title</jTitle>
        <Year>2016</Year><vol>1</vol>
        <fpage>12</fpage><lpage>14</lpage>
    </RefText></Ref><!-- should list -->

<Ref id="ref2">
    <RefText>
        <authors><au><snm>Rudramuni</snm><fnm>TP</fnm></au></authors>
        <artTitle>The article1</artTitle><jTitle>Journal title</jTitle>
        <Year>2017</Year><vol>2</vol>
        <fpage>22</fpage><lpage>24</lpage>
        </RefText></Ref><!-- This Ref should not list in output xml, because 'authors, articleTitle, like other same type elements present, ref2 is same as ref1. -->

<Ref id="ref3">
    <RefText>
        <authors><au><snm>Likhith</snm><fnm>MD</fnm></au></authors>
        <artTitle>The article1</artTitle><jTitle>Journal title</jTitle>
        <Year>2017</Year><fpage>22</fpage><lpage>24</lpage>
        </RefText></Ref><!-- It should list, bcs, 'vol' missing here, then it is unique in structure with respect to prev Refs -->

<Ref id="ref4">
    <RefText>
        <authors><au><snm>Kowshik</snm><fnm>MD</fnm></au></authors>
        <artTitle>The article1</artTitle><jTitle>Journal title</jTitle>
        <Year>2017</Year><fpage>22</fpage>
        </RefText></Ref><!-- should list, bcs, 'lpage' missing -->

<Ref id="ref5">
    <RefText>
        <editors><au><snm>Dhyan</snm><fnm>MD</fnm></au></editors>
        <artTitle>The article1</artTitle><jTitle>Journal title</jTitle>
        <Year>2017</Year><fpage>22</fpage>
        </RefText></Ref><!-- should list, bcs, 'editors' missing -->

<Ref id="ref6">
    <RefText>
        <editors><ed><snm>Kishan</snm><fnm>TR</fnm></ed></editors>
        <artTitle>The article1</artTitle><jTitle>Journal title</jTitle>
        <Year>2017</Year>
        </RefText></Ref><!-- should list -->

<Ref id="ref7">
    <RefText>
        <editors><ed><snm>Vivan</snm><fnm>S</fnm></ed></editors>
        <artTitle>The article1</artTitle><jTitle>Journal title</jTitle>
        <Year>2017</Year>
        </RefText></Ref><!-- should not, same type elements in ref6 and ref7 -->

<Ref id="ref8">
    <RefText><editors><au><snm>Dhyan</snm><fnm>MD</fnm></au><au><snm>Dhyan</snm><fnm>MD</fnm></au></editors>
        <artTitle>The article1</artTitle><jTitle>Journal title</jTitle>
        <Year>2017</Year><fpage>22</fpage>
        </RefText></Ref><!-- should not, bcs, 'Ref5 and Ref8' are having same elements -->

</article>

XSLT 2.0: Here, I have considered variables to store preceding Ref's descendants names.

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" >

<xsl:template match="@*|node()">
    <xsl:copy><xsl:apply-templates select="@*|node()"/></xsl:copy>
</xsl:template>

<xsl:template match="article">
    <article>

        <xsl:for-each select="descendant::Ref">
            <xsl:variable name="varPrev">
            <xsl:for-each select="preceding::Ref">
                <a>
                    <xsl:text>|</xsl:text>
                        <xsl:for-each select="descendant::*[not(ancestor-or-self::au) and not(ancestor-or-self::ed)]">
                            <xsl:value-of select="name()"/>
                        </xsl:for-each>
                    <xsl:text>|</xsl:text>
                </a>
            </xsl:for-each>
        </xsl:variable>
            <xsl:variable name="varPresent">
                <a>
                    <xsl:text>|</xsl:text>
                        <xsl:for-each select="descendant::*[not(ancestor-or-self::au) and not(ancestor-or-self::ed)]">
                            <xsl:value-of select="name()"/>
                        </xsl:for-each>
                    <xsl:text>|</xsl:text>
                </a>
            </xsl:variable>
            <xsl:if test="not(contains($varPrev, $varPresent))">
                <xsl:copy><xsl:apply-templates select="@*|node()"/></xsl:copy>
            </xsl:if>

        </xsl:for-each>
    </article>
</xsl:template>

<!--xsl:key name="keyRef" match="Ref" use="descendant::*"/>

<xsl:template match="article">
    <xsl:for-each select="descendant::Ref">
        <xsl:if test="count('keyRef', ./name())=1">
            <xsl:copy><xsl:apply-templates select="@*|node()"/></xsl:copy>
        </xsl:if>
    </xsl:for-each>
</xsl:template-->

</xsl:stylesheet>

Required Result:

<article>
<Ref id="ref1">
    <RefText>
        <authors><au><snm>Kishan</snm><fnm>TR</fnm></au><au><snm>Rudramuni</snm><fnm>TP</fnm></au></authors>
        <artTitle>The article1</artTitle><jTitle>Journal title</jTitle>
        <Year>2016</Year><vol>1</vol>
        <fpage>12</fpage><lpage>14</lpage>
    </RefText></Ref>
<Ref id="ref3">
    <RefText>
        <authors><au><snm>Likhith</snm><fnm>MD</fnm></au></authors>
        <artTitle>The article1</artTitle><jTitle>Journal title</jTitle>
        <Year>2017</Year><fpage>22</fpage><lpage>24</lpage>
        </RefText></Ref>
<Ref id="ref4">
    <RefText>
        <authors><au><snm>Kowshik</snm><fnm>MD</fnm></au></authors>
        <artTitle>The article1</artTitle><jTitle>Journal title</jTitle>
        <Year>2017</Year><fpage>22</fpage>
        </RefText></Ref>
<Ref id="ref5">
    <RefText><editors><au><snm>Dhyan</snm><fnm>MD</fnm></au></editors>
        <artTitle>The article1</artTitle><jTitle>Journal title</jTitle>
        <Year>2017</Year><fpage>22</fpage>
        </RefText></Ref>
<Ref id="ref6">
    <RefText>
        <editors><ed><snm>Kishan</snm><fnm>TR</fnm></ed></editors>
        <artTitle>The article1</artTitle><jTitle>Journal title</jTitle>
        <Year>2017</Year>
        </RefText></Ref>
</article>

Solution

  • Here is an attempt to use a key computed similarly to your string comparison:

    <?xml version="1.0" encoding="UTF-8" ?>
    <xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"
      xmlns:xs="http://www.w3.org/2001/XMLSchema"
      xmlns:mf="http://example.com/mf" exclude-result-prefixes="mf xs">
    
        <xsl:function name="mf:fingerprint" as="xs:string">
            <xsl:param name="input-element" as="element()"/>
            <xsl:value-of select="for $d in $input-element/descendant::*[not(ancestor-or-self::au) and not(ancestor-or-self::ed)] return node-name($d)" separator="|"/>
        </xsl:function>
    
        <xsl:key name="group" match="Ref" use="mf:fingerprint(.)"/>
    
        <xsl:template match="@*|node()">
            <xsl:copy>
                <xsl:apply-templates select="@*|node()"/>
            </xsl:copy>
        </xsl:template>
    
        <xsl:template match="Ref[not(. is key('group', mf:fingerprint(.))[1])]"/>
    </xsl:transform>
    

    It seems to do the job at http://xsltransform.net/bwdwsC as far as I can tell but I am not quite sure that string concatenation of names is sufficient to work with all kind of inputs.