Search code examples
xmlxsltxpathxslt-2.0xpath-2.0

Matching and filtering nodes with contains() in XSLT 2.0


I have a set of XML records about people, and these records encode relationships to other people, as well as to related resources (books, etc.). The basic structure is like this:

<record>
    <name>Smith, Jane</name>
    <relations>
        <relation>Frost, Robert, 1874-1963</relation>
        <relation>Jones, William</relation>            
        <resource>
            <title>Poems</title>
            <author>Frost, Robert</author>
        </resource>
        <resource>
            <title>Some Title</title>
            <author>Author, Some</author>
        </resource>
    </relations>
</record>

I need to process this--in a "pull"-style XSLT 2.0 stylesheet--so that I am able to filter out <relation> nodes that contain or start with the text from a later <author> node. The two will not be exact matches because the <relation> nodes contain additional text, usually birth and death dates.

My initial impulse was to try to use a pair of xsl:for-each loops to test the node values, but that doesn't work. This sample stylesheet...

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema" exclude-result-prefixes="xs" version="2.0">
    <xsl:template match="/">
        <test>
            <xsl:for-each select="record/relations/relation">
                <xsl:variable name="relation" select="."/>
                <xsl:for-each select="../resource/author">
                    <xsl:variable name="resource" select="."/>
                    <relation>
                        <xsl:value-of select="$relation[not(contains(.,$resource))]"/>
                    </relation>
                </xsl:for-each>
            </xsl:for-each>
        </test>
    </xsl:template>
</xsl:stylesheet>

...gives me:

<test>
    <relation/>
    <relation>Frost, Robert, 1874-1963</relation>
    <relation>Jones, William</relation>
    <relation>Jones, William</relation>
</test>

But what I really want is just:

<test>
    <relation>Jones, William</relation>
</test>

How can I compare these two node sets and only filter out the matching nodes? Again, I need to do this within an existing "pull"-style stylesheet that has only one xsl:template, matching the document root. Thanks in advance!


Solution

  • I think you can do this in a single XPath 2.0 "quantified expression", essentially you're looking for all the relation elements such that

    not(some $author in following::author satisfies contains(., $author))
    

    Or you could use ../resource/author instead of following::author if you want to keep the checks within each individual relations block.

    XSLT example:

    <xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    
      <xsl:output method="xml" indent="yes" />
    
      <xsl:template match="/">
        <test>
          <xsl:sequence select="record/relations/relation[
           not(some $author in following::author satisfies contains(., $author))]" />
        </test>
      </xsl:template>
    
    </xsl:stylesheet>
    

    Output:

    <?xml version="1.0" encoding="UTF-8"?>
    <test>
       <relation>Jones, William</relation>
    </test>