Search code examples
xsltxpathxslt-1.0distinctxpath-1.0

Distinct values with XSLT 1.0 when XPath has multiple criteria


Yet another question about getting distinct values using XSLT 1.0. Here's a stupid, made-up example that should illustrate my problem.

<?xml version="1.0" encoding="UTF-8"?>
<moviesByYear>
    <year1994>
        <movie>
            <genre>Action</genre>
            <director>A</director>
        </movie>
    </year1994>
    <year1994>
        <movie>
            <genre>Comedy</genre>
            <director>A</director>
        </movie>
    </year1994>
    <year1994>
        <movie>
            <genre>Drama</genre>
            <director>B</director>
        </movie>
    </year1994>
    <year1994>
        <movie>
            <genre>Thriller</genre>
            <director>C</director>
        </movie>
    </year1994>
    <year1995>
        <movie>
            <genre>Action</genre>
            <director>A</director>
        </movie>
    </year1995>
    <year1995>
        <movie>
            <genre>Comedy</genre>
            <director>C</director>
        </movie>
    </year1995>
    <year1996>
        <movie>
            <genre>Thriller</genre>
            <director>A</director>
        </movie>
    </year1996>
</moviesByYear>

Now let's say that I'd like to list all years that produced movies that are either comedies or directed by director B. I use the following stylesheet:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:fo="http://www.w3.org/1999/XSL/Format">
    <xsl:output method="text" encoding="UTF-8" indent="no"/>
    <xsl:template match="/">
        <xsl:for-each select="/moviesByYear/*[movie/genre='Comedy' or movie/director='B']">
            <xsl:value-of select="name()"/>
        </xsl:for-each>
    </xsl:template>
</xsl:stylesheet>

This gives me the following output:

year1994year1994year1995

I have not yet found any solution for getting distinct values that would work here. For example, using name(.) != name(following-sibling::*) causes year1994 to be excluded altogether.

In my real-world case I have a complex XML structure and an XPath with many criteria that picks out a number of nodes, from which I need to get an output of distinct node names.

Update: michael.hor257k gave an elegant solution to this, but using it I faced a problem with xsl:key. Allow me to alter the scenario a bit:

<?xml version="1.0" encoding="UTF-8"?>
<root>
    <genres>
        <genre>Action</genre>
        <genre>Comedy</genre>
        <genre>Drama</genre>
        <genre>Thriller</genre>
    </genres>
    <moviesByYear>
        <year1994>
            <movie>
                <genre>Action</genre>
                <director>A</director>
            </movie>
        </year1994>
        <year1994>
            <movie>
                <genre>Comedy</genre>
                <director>A</director>
            </movie>
        </year1994>
        <year1994>
            <movie>
                <genre>Drama</genre>
                <director>B</director>
            </movie>
        </year1994>
        <year1994>
            <movie>
                <genre>Thriller</genre>
                <director>C</director>
            </movie>
        </year1994>
        <year1995>
            <movie>
                <genre>Action</genre>
                <director>A</director>
            </movie>
        </year1995>
        <year1995>
            <movie>
                <genre>Comedy</genre>
                <director>C</director>
            </movie>
        </year1995>
        <year1996>
            <movie>
                <genre>Thriller</genre>
                <director>A</director>
            </movie>
        </year1996>
    </moviesByYear>
</root>

Now let's say that I want a list of genres, each of which lists years that produced movies of that genre or movies directed by director B. Stylesheet:

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:exsl="urn:schemas-microsoft-com:xslt"
extension-element-prefixes="exsl">
<xsl:output method="text" version="1.0" encoding="UTF-8" indent="no"/>

<xsl:template match="/">
    <xsl:for-each select="/root/genres/genre">
        <xsl:call-template name="output">
            <xsl:with-param name="genre">
                <xsl:value-of select="."/>
            </xsl:with-param>
        </xsl:call-template>
    </xsl:for-each>
</xsl:template>

<xsl:param name="director" select="'B'"/>

<xsl:key name="year" match="year" use="." />

<xsl:template name="output">
    <xsl:param name="genre"/>

    <!-- first pass -->
    <xsl:variable name="years">
        <xsl:for-each select="/root/moviesByYear/*/movie[genre=$genre or director=$director]"> 
            <year><xsl:value-of select="local-name(..)"/></year>
        </xsl:for-each>
    </xsl:variable>
    <xsl:variable name="years-set" select="exsl:node-set($years)" />

    <!-- final pass -->
    <xsl:value-of select="concat($genre, ': ')"/> 
    <xsl:for-each select="$years-set/year[count(. | key('year', .)[1]) = 1]">
        <xsl:value-of select="."/>
    </xsl:for-each>
    <xsl:text>&#10;</xsl:text>

</xsl:template>

</xsl:stylesheet>

This produces the following output:

Action: year1994year1995
Comedy: 
Drama: 
Thriller: year1996

As you can see, each year is listed only once. The desired output would have been:

Action: year1994year1995
Comedy: year1994year1995
Drama: year1994
Thriller: year1994year1996

Solution

  • Here's a different implementation of Muenchian grouping - one that allows you to parametrize the criteria by which the movies are selected.

    <xsl:stylesheet version="1.0" 
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:exsl="http://exslt.org/common"
    extension-element-prefixes="exsl">
    <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
    
    <xsl:param name="genre" select="'Comedy'"/>
    <xsl:param name="director" select="'B'"/>
    
    <xsl:key name="year" match="year" use="." />
    
    <xsl:template match="/">
    
        <!-- first pass -->
        <xsl:variable name="years">
            <xsl:for-each select="moviesByYear/*/movie[genre=$genre or director=$director]"> 
                <year><xsl:value-of select="local-name(..)"/></year>
            </xsl:for-each>
        </xsl:variable>
        <xsl:variable name="years-set" select="exsl:node-set($years)" />
    
        <!-- final pass -->
        <output>
            <xsl:for-each select="$years-set/year[count(. | key('year', .)[1]) = 1]">
                <xsl:copy-of select="."/>
            </xsl:for-each>
        </output>
    
    </xsl:template>
    
    </xsl:stylesheet>
    

    When the above is applied to your example input, the result is:

    <?xml version="1.0" encoding="UTF-8"?>
    <output>
       <year>year1994</year>
       <year>year1995</year>
    </output>
    

    Edit:

    With regard to your modified input, I believe I would do it this way:

    <xsl:stylesheet version="1.0" 
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:exsl="http://exslt.org/common"
    extension-element-prefixes="exsl">
    <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
    
    <xsl:param name="director" select="'B'"/>
    
    <xsl:key name="movies-by-genre" match="movie" use="genre" />
    <xsl:key name="movies-by-director" match="movie" use="director" />
    <xsl:key name="year" match="year" use="." />
    
    <xsl:template match="/">
        <output>
            <xsl:apply-templates select="root/genres/genre"/>
        </output>
    </xsl:template>
    
    <xsl:template match="genre">
        <!-- first pass -->
        <xsl:variable name="years">
            <xsl:for-each select="key('movies-by-genre', .) | key('movies-by-director', $director)"> 
                <year><xsl:value-of select="local-name(..)"/></year>
            </xsl:for-each>
        </xsl:variable>
        <xsl:variable name="years-set" select="exsl:node-set($years)" />
        <!-- final pass -->
        <genre name="{.}">
            <xsl:for-each select="$years-set/year[count(. | key('year', .)[1]) = 1]">
                <xsl:copy-of select="."/>
            </xsl:for-each>
        </genre>
    </xsl:template>
    
    </xsl:stylesheet>
    

    The result here is:

    <?xml version="1.0" encoding="UTF-8"?>
    <output>
       <genre name="Action">
          <year>year1994</year>
          <year>year1995</year>
       </genre>
       <genre name="Comedy">
          <year>year1994</year>
          <year>year1995</year>
       </genre>
       <genre name="Drama">
          <year>year1994</year>
       </genre>
       <genre name="Thriller">
          <year>year1994</year>
          <year>year1996</year>
       </genre>
    </output>
    

    Note: the two added keys are for efficiency only - they are not required for the main purpose here.


    Edit 2:

    On second thought, we could do this all in a single pass, thus (hopefully) avoiding the issues Xalan and MSXSML have with processing a variable - but still using Muenchian grouping:

    <xsl:stylesheet version="1.0" 
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
    
    <xsl:param name="director" select="'B'"/>
    
    <xsl:key name="year" match="moviesByYear/*" use="local-name()" />
    
    <xsl:template match="/">
        <output>
            <xsl:apply-templates select="root/genres/genre"/>
        </output>
    </xsl:template>
    
    <xsl:template match="genre">
        <xsl:variable name="genre" select="." />
        <genre name="{$genre}">
            <xsl:for-each select="../../moviesByYear/* 
            [count(. | key('year', local-name())[1]) = 1]
            [key('year', local-name())/movie[genre=$genre or director=$director]]">
                <year>
                    <xsl:value-of select="local-name()"/>
                </year>  
            </xsl:for-each>
        </genre>
    </xsl:template>
    
    </xsl:stylesheet>