Newbie here. I have a large XML file.
I would like to keep only the elements <w>
which have and attribute @lemma
with no duplicates (i.e. keep only elements with a unique @lemma
value)
Sample:
<p>
<w xml:lang="arn" lemma="one">a</w>
<w xml:lang="arn" lemma="two">b</w>
<w xml:lang="arn" lemma="three">c</w>
<w xml:lang="arn" lemma="one">d</w>
<w xml:lang="arn" lemma="two">e</w>
</p>
output should be:
<?xml version="1.0" encoding="UTF-8"?>
<p>
<w xml:lang="arn" lemma="three">c</w>
</p>
since it is the only <w>
with the @lemma="three"
Many thanks!
Expanding Martin's answer,
<xsl:for-each-group select="w" group-by="@lemma">
<xsl:if test="count(current-group()) = 1">
<xsl:copy-of select="current-group()"/>
</xsl:if>
</xsl:for-each-group>
You haven't said what you mean by "large" but if it's so large that you need to use streaming, then a solution using xsl:iterate
would be possible. It can't be 100% streamed of course because you need to track what keys you have seen, but with a semi-streamed solution you can discard elements as soon as you know they are duplicates.