Search code examples
xmlxsltwkhtmltopdf

XML, XSLT Group Same Elements by Title With Child Nodes


I'm trying to build out an appendix/index page from the outline XML output of wkhtmltopdf.

Is there a way to loop through elements and group them by specific attribute value without using the key() function or XSLT 2.0 for-each-group? This is because of some limitation in the XSL processor being use in wkhtmltopdf.

I'm thinking of using the preceding-sibling to check if the title is still the same.

<xsl:for-each select="//o:item">
   <xsl:sort select="@title"></xsl:sort>
   <xsl:variable name="key" select="@title" />
   <xsl:if test="not(preceding-sibling::o:item[@title=$key])">
       <xsl:value-of select="$key"></xsl:value-of>
       <xsl:for-each select="current()/o:item">
         <xsl:element name="{@title}" />
       </xsl:for-each>
    </xsl:if> 
</xsl:for-each>

I'm fairly new with XSLT so any help would be greatly appreciated.

Here's the outline xml from wkhtmltopdf:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="toc.xsl"?>
<outline xmlns="http://wkhtmltopdf.org/outline">
    <item title="PDF" page="0" link="__WKANCHOR_0" backLink="__WKANCHOR_1">
        <item title="Type1" page="1" link="__WKANCHOR_2" backLink="__WKANCHOR_3">
            <item title="SubType1" page="1" link="__WKANCHOR_4" backLink="__WKANCHOR_5">
                <item title="Collection1" page="1" link="__WKANCHOR_6" backLink="__WKANCHOR_7">
                    <item title="Item1" page="1" link="__WKANCHOR_8" backLink="__WKANCHOR_9"/>
                </item>
                <item title="Collection2" page="1" link="__WKANCHOR_a" backLink="__WKANCHOR_b">
                    <item title="Item2" page="1" link="__WKANCHOR_c" backLink="__WKANCHOR_d"/>
                    <item title="Item3" page="2" link="__WKANCHOR_e" backLink="__WKANCHOR_f"/>
                </item>
            </item>
            <item title="SubType2" page="3" link="__WKANCHOR_g" backLink="__WKANCHOR_h">
                <item title="Collection1" page="3" link="__WKANCHOR_i" backLink="__WKANCHOR_j">
                    <item title="Item4" page="3" link="__WKANCHOR_k" backLink="__WKANCHOR_l"/>
                </item>
            </item>
        </item>
        <item title="Type2" page="4" link="__WKANCHOR_m" backLink="__WKANCHOR_n">
            <item title="SubType1" page="4" link="__WKANCHOR_o" backLink="__WKANCHOR_p">
                <item title="Collection1" page="5" link="__WKANCHOR_u" backLink="__WKANCHOR_v">
                    <item title="Item5" page="4" link="__WKANCHOR_q" backLink="__WKANCHOR_r"/>
                </item>
            </item>
            <item title="SubType3" page="5" link="__WKANCHOR_s" backLink="__WKANCHOR_t">
                <item title="Collection3" page="5" link="__WKANCHOR_u" backLink="__WKANCHOR_v">
                    <item title="Item6" page="5" link="__WKANCHOR_w" backLink="__WKANCHOR_x"/>
                    <item title="Item7" page="5" link="__WKANCHOR_y" backLink="__WKANCHOR_z"/>
                    <item title="Item8" page="5" link="__WKANCHOR_10" backLink="__WKANCHOR_11"/>
                </item>
            </item>
        </item>
    </item>
</outline>

Expected output is (group all distinct 4th item child items):

<Collection1>
    <Item1></Item1>
    <Item4></Item4>
    <Item5></Item5>
</Collection1>
<Collection2>
    <Item2></Item2>
    <Item3></Item3>
</Collection2>
<Collection3>
    <Item6></Item6>
    <Item7></Item7>
    <Item8></Item8>
</Collection3>

Solution

  • You can use <xsl:for-each-group> and group-by on @title[contains(., 'Collection')] to prepare the groups and then loop on the current-group() to get the elements.

    Please try the following XSLT 2.0 solution

    <xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
        xmlns:o="http://wkhtmltopdf.org/outline">
        <xsl:output method="xml" indent="yes" />
        <xsl:strip-space elements="*" />
    
        <xsl:template match="/">
            <xsl:for-each-group select="//o:item" group-by="@title[contains(., 'Collection')]">
                <xsl:element name="{current-grouping-key()}">
                    <xsl:for-each select="current-group()/o:item">
                        <xsl:element name="{@title}" />
                    </xsl:for-each>
                </xsl:element>
            </xsl:for-each-group>
        </xsl:template>
    </xsl:stylesheet>
    

    In case XSLT 1.0 is being used then <xsl:key> has to be defined and then the loop should run on grouped elements. Below is the XSLT 1.0 solution.

    <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
        xmlns:o="http://wkhtmltopdf.org/outline">
        <xsl:output method="xml" />
        <xsl:strip-space elements="*" />
    
        <xsl:key name="kTitle" match="//o:item" use="@title[contains(.,'Collection')]" />
    
        <xsl:template match="/">
            <xsl:for-each select="//o:item[generate-id() = generate-id(key('kTitle', @title[contains(.,'Collection')])[1])]">
                <xsl:element name="{@title}">
                    <xsl:for-each select="key('kTitle', @title[contains(.,'Collection')])/o:item">
                        <xsl:element name="{@title}" />
                    </xsl:for-each>
                </xsl:element>
            </xsl:for-each>
        </xsl:template>
    </xsl:stylesheet>
    

    Output

    <Collection1>
       <Item1/>
       <Item4/>
       <Item5/>
    </Collection1>
    <Collection2>
       <Item2/>
       <Item3/>
    </Collection2>
    <Collection3>
       <Item6/>
       <Item7/>
       <Item8/>
    </Collection3>