Search code examples
xsltxpathxslt-2.0xpath-2.0xslt-grouping

How to select a node and its siblings until the same node from the start occurs


This is probably tricky question.

I have this document:

<html>
   <h1>title1</h1>
   <p>content</p>
   <ul>
      <li>item</li>
   </ul>
   <h1>title2</h1>
   <p>content</p>
   <p>another content</p>
   <h1>title3</h1>
   <ol>
      <li><p>text</p></li>
   </ol>
</html>

This example shows that I have some html where "sections" are divided by headlines. It is headline, then various mix of elements and then headline again.

I want these sections to be wrapped with div using XSLT 2.

So, I want to use <h1> headline to select groups. Basicly the grouping rule is:

  • Select h1 and all his siblings until another h1 occurs.

This stress on "and" is very important because I came up with XPath formulas which select nodes only between h1 nodes or select from the first to the last h1 ignoring h1 between them. I need to get that headline too. I dont think it is difficult to achieve this with XPath, but I can't make it work.

For the best understandment, here is what it should look like:

<html>
   <div>
      <h1>title1</h1>
      <p>content</p>
      <ul>
         <li>item</li>
      </ul>
   </div>
   <div>
      <h1>title2</h1>
      <p>content</p>
      <p>another content</p>
   </div>
   <div>
      <h1>title3</h1>
      <ol>
         <li><p>text</p></li>
      </ol>
   </div>
</html>

Solution

  • You can use group-starting-with for this:

    <xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
      <xsl:output method="xml" indent="yes"/>
    
      <xsl:template match="html">
        <xsl:copy>
          <xsl:for-each-group select="node()" group-starting-with="h1">
            <div>
              <xsl:copy-of select="current-group()" />
            </div>
          </xsl:for-each-group>
        </xsl:copy>
      </xsl:template>
    </xsl:stylesheet>
    

    When run on your sample, the result is:

    <html>
       <div>
          <h1>title1</h1>
          <p>content</p>
          <ul>
             <li>item</li>
          </ul>
       </div>
       <div>
          <h1>title2</h1>
          <p>content</p>
          <p>another content</p>
       </div>
       <div>
          <h1>title3</h1>
          <ol>
             <li>
                <p>text</p>
             </li>
          </ol>
       </div>
    </html>
    

    As a bit of extra info, if you wanted to use group-by to do this, you could do so like this:

      <xsl:for-each-group select="node()" group-by="self::h1 | preceding-sibling::h1[1]">
    

    And that should produce the same result.