Search code examples
javaxmlxsltxslt-1.0xslt-grouping

XSLT 1 Grouping and combining IDs to csv


I have the following XML document and like to group the books by the group tag and combine all IDs (titles) of a group to a csv using Java and XSLT 1.

I furthermore like to have a summary element containing all shared information of a book series (SeriesInfo) as well as two elements in each group; one (e.g. Titles) containing all titles (IDs) of this group comma separated (csv) and one (e.g. AnyTitle) containing any title (which one does not matter, first or last is fine).

I've managed to do the grouping by Muenchian Grouping but don't know how to get the csv and the any element. I've done some research on this, but the solutions I found were either extremely specific or using XSLT 2 or higher.

Source XML

<?xml version="1.0" encoding="UTF-8"?>
<Books>

    <Book>
        <Title>Harry Potter and the philosopher's stone</Title>
        <Group>Harry Potter</Group>
        <Author>J.K.R.</Author>
        <Pages>650</Pages>
    </Book>

    <Book>
        <Title>Harry Potter and the chamber of secrets</Title>
        <Group>Harry Potter</Group>
        <Author>J.K.R.</Author>
        <Pages>700</Pages>
    </Book>

    <Book>
        <Title>Lord of the Rings complete edition</Title>
        <Group>Lord of the Rings</Group>
        <Author>J.R.R. Tolkien</Author>
        <Pages>2500</Pages>
    </Book>

</Books>

Destination XML

<?xml version="1.0" encoding="UTF-8"?>
<Serieses>

    <Series>
        <Group>Harry Potter</Group>
        <Titles>Harry Potter and the philosopher's stone,Harry Potter and the chamber of secrets</Titles>
        <AnyTitle>Harry Potter and the chamber of secrets</AnyTitle>

        <Books>
            <Book>
                <Title>Harry Potter and the philosopher's stone</Title>
                <Group>Harry Potter</Group>
                <Pages>650</Pages>
            </Book>

            <Book>
                <Title>Harry Potter and the chamber of secrets</Title>
                <Group>Harry Potter</Group>
                <Pages>700</Pages>
            </Book>
        </Books>

        <SeriesInfo>
            <Author>J.K.R.</Author>
            <Group>Harry Potter</Group>
        </SeriesInfo>
    </Series>

    <Series>
        <Group>Lord of the Rings</Group>
        <Titles>Lord of the Rings complete edition</Titles>
        <AnyTitle>Lord of the Rings complete edition</AnyTitle>

        <Books>
            <Book>
                <Title>Lord of the Rings complete edition</Title>
                <Group>Lord of the Rings</Group>
                <Pages>2500</Pages>
            </Book>
        </Books>

        <SeriesInfo>
            <Author>J.R.R. Tolkien</Author>
            <Group>Lord of the Rings</Group>
        </SeriesInfo>
    </Series>

</Serieses>

Using the following XSLT

<xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" indent="yes" />
    <xsl:strip-space elements="*" />

    <xsl:key name="book-by-name" match="Book" use="Group" />

    <xsl:template match="@* | node()">
        <xsl:copy>
            <xsl:apply-templates select="@* | node()" />
        </xsl:copy>
    </xsl:template>

    <xsl:template match="Books">
        <Serieses>
            <xsl:apply-templates
                select="Book[generate-id() = generate-id(key('book-by-name', Group)[1])]"
                mode="group" />
        </Serieses>
    </xsl:template>

    <xsl:template match="Book" mode="group">
        <Series>
            <xsl:copy-of select="Group" />

            <Books>
                <xsl:apply-templates
                    select="key('book-by-name', Group)" />
            </Books>

            <SeriesInfo>
                <xsl:copy-of select="Author" />
                <xsl:copy-of select="Group" />
            </SeriesInfo>

        </Series>
    </xsl:template>

    <xsl:template match="Book">
        <Book>
            <xsl:apply-templates
                select="node()[self::Title|self::Group|self::Pages]" />
        </Book>
    </xsl:template>

</xsl:stylesheet>

I was able to get the following output:

<?xml version="1.0" encoding="UTF-8"?>
<Serieses>

    <Series>
        <Group>Harry Potter</Group>

        <Books>
            <Book>
                <Title>Harry Potter and the philosopher's stone</Title>
                <Group>Harry Potter</Group>
                <Pages>650</Pages>
            </Book>

            <Book>
                <Title>Harry Potter and the chamber of secrets</Title>
                <Group>Harry Potter</Group>
                <Pages>700</Pages>
            </Book>
        </Books>

        <SeriesInfo>
            <Author>J.K.R.</Author>
            <Group>Harry Potter</Group>
        </SeriesInfo>
    </Series>

    <Series>
        <Group>Lord of the Rings</Group>

        <Books>
            <Book>
                <Title>Lord of the Rings complete edition</Title>
                <Group>Lord of the Rings</Group>
                <Pages>2500</Pages>
            </Book>
        </Books>

        <SeriesInfo>
            <Author>J.R.R. Tolkien</Author>
            <Group>Lord of the Rings</Group>
        </SeriesInfo>
    </Series>

</Serieses>

Using any more recent version of XSLT is not really helpful to me because I need to rely on the standard library.

EDIT: Clearified what I meant by any title: Doesn't really matter, first or last is fine.


Solution

  • Here's one way you could look at it:

    XSLT 1.0

    <xsl:stylesheet version="1.0" 
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
    <xsl:strip-space elements="*"/>
    
    <xsl:key name="book-by-group" match="Book" use="Group" />
    
    <!-- identity transform -->
    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>
    
    <xsl:template match="/Books">
        <Serieses>
            <xsl:apply-templates select="Book[generate-id() = generate-id(key('book-by-group', Group)[1])]" mode="group" />
        </Serieses>
    </xsl:template>
    
    <xsl:template match="Book" mode="group">
        <xsl:variable name="current-group" select="key('book-by-group', Group)" />
        <Series>
            <xsl:apply-templates select="Group" />
            <Titles>
                <xsl:apply-templates select="$current-group" mode="Title"/>
            </Titles>
            <AnyTitle>
                <xsl:value-of select="$current-group[1]/Title"/>
            </AnyTitle>
            <Books>
                <xsl:apply-templates select="$current-group" />
            </Books>
            <SeriesInfo>
                <xsl:apply-templates select="Author" />
                <xsl:apply-templates select="Group" />
            </SeriesInfo>
        </Series>
    </xsl:template>
    
    <xsl:template match="Book">
        <Book>
            <xsl:apply-templates select="Title | Group| Pages" />
        </Book>
    </xsl:template>
    
    <xsl:template match="Book" mode="Title">
        <xsl:value-of select="Title"/>
        <xsl:if test="position() != last()">,</xsl:if>
    </xsl:template>
    
    </xsl:stylesheet>
    

    This populates the Titles element with a comma-separated list of the group's titles. For the AnyTitle element, I chose the title of the first book in the group.


    Personally, I would prefer to shorten the whole thing to:

    XSLT 1.0

    <xsl:stylesheet version="1.0" 
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
    <xsl:strip-space elements="*"/>
    
    <xsl:key name="book-by-group" match="Book" use="Group" />
    
    <xsl:template match="/Books">
        <Serieses>
            <xsl:for-each select="Book[generate-id() = generate-id(key('book-by-group', Group)[1])]">
                <xsl:variable name="current-group" select="key('book-by-group', Group)" />
                <Series>
                    <xsl:copy-of select="Group" />
                    <Titles>
                        <xsl:for-each select="$current-group">
                            <xsl:value-of select="Title"/>
                            <xsl:if test="position() != last()">,</xsl:if>
                        </xsl:for-each>
                    </Titles>
                    <AnyTitle>
                        <xsl:value-of select="$current-group[1]/Title"/>
                    </AnyTitle>
                    <Books>
                        <xsl:for-each select="$current-group">
                            <xsl:copy>
                                <xsl:copy-of select="Title | Group| Pages" />
                            </xsl:copy>
                        </xsl:for-each>
                    </Books>
                    <SeriesInfo>
                        <xsl:copy-of select="Author" />
                        <xsl:copy-of select="Group" />
                    </SeriesInfo>
                </Series>
            </xsl:for-each>
        </Serieses>
    </xsl:template>
    
    </xsl:stylesheet>