Search code examples
xmlxsltxpath-2.0

How to remove duplicate nodes based on their children?


Suppose I have the following sequence:

<node-1> <children-A attr="100" /> </node-1>
<node-1> <children-A attr="200" /> </node-1>     <!--not a duplicate -->
<node-1> <children-B /> </node-1>
<node-1> <children-B /> </node-1>                <!-- duplicate off the above -->
<node-1> <children-A /> <children-B /> </node-1> <!--not a duplicate -->

I want to obtain all unique "node-1" so that the output would be:

<node-1> <children-A attr="100" /> </node-1>
<node-1> <children-A attr="200" /> </node-1>
<node-1> <children-B /> </node-1>
<node-1> <children-A /> <children-B /> </node-1>

NOTE: only <node-1> <children-B /> </node-1> has been removed.

Using Saxon 9.1.0.8, I've tried distinct-value($S) but the return type is xs:anyAtomicType and I don't know how to cast it to a proper sequence (if that's even possible!).

I am, however, able to use count(distinct-value($S)) to check if the number of elements returned match the actual number of unique elements, and in fact it does match.


Solution

  • Using http://www.xsltfunctions.com/xsl/functx_distinct-deep.html (which in turn uses http://www.xsltfunctions.com/xsl/functx_is-node-in-sequence-deep-equal.html which in turn uses the XSLT/XPath 2.0 function deep-equal) you could use <xsl:variable name="distinct-seq" select="functx:distinct-deep($your-sequence)"/>.

    Or if you don't want or can't include the functx library you would need to use its code in

    <xsl:variable name="distinct-seq"
      select="for $pos in (1 to count($your-sequence))
              return $your-sequence[$pos]
                                   [not(some $node in $your-sequence[position() lt $pos] satisfies deep-equal(., $node))]"/>