Search code examples
xmlxsltattributesunique

keep only xml elements with unique attributes with XSLT 3.0


Newbie here. I have a large XML file. I would like to keep only the elements <w> which have and attribute @lemma with no duplicates (i.e. keep only elements with a unique @lemma value)

Sample:

<p>
    <w xml:lang="arn" lemma="one">a</w>
    <w xml:lang="arn" lemma="two">b</w>
    <w xml:lang="arn" lemma="three">c</w>
    <w xml:lang="arn" lemma="one">d</w>
    <w xml:lang="arn" lemma="two">e</w>
</p>

output should be:

<?xml version="1.0" encoding="UTF-8"?>
<p>
    <w xml:lang="arn" lemma="three">c</w>
</p>

since it is the only <w> with the @lemma="three"

Many thanks!


Solution

  • Expanding Martin's answer,

    <xsl:for-each-group select="w" group-by="@lemma">
      <xsl:if test="count(current-group()) = 1">
        <xsl:copy-of select="current-group()"/>
      </xsl:if>
    </xsl:for-each-group>
    

    You haven't said what you mean by "large" but if it's so large that you need to use streaming, then a solution using xsl:iterate would be possible. It can't be 100% streamed of course because you need to track what keys you have seen, but with a semi-streamed solution you can discard elements as soon as you know they are duplicates.