Search code examples
htmlxmlxslttei

Simple unflattening HTML file through XSL


I looked around for unflattening procedures through XSL, but none of them really works for me, although I believe my case is pretty simple. I have a collection of HTML, always the same structure, I would like to unflatten through XSL transformation. Basically it is about encapsulating in a <div> element all the elements following a <p class='subtitle'> up to the next <p class='subtitle'>, and – ideally! – still applying transformation to the elements individually, but that is optional (see below).

Source file looks like:

[...some stuff on the page]
<p class='header'>Some text</p>
<p class='subtitle'>Subtitle 1</p>
<p class='content'>First paragraph of part 1, with some <span>Inside</span> and other 
nested elements, on multiple levels</p>
<ul>a list with <li> inside</ul>
<p class='content'>Second paragraph of part 1</p>
<img src='xyz.jpg'/>
<p class='content'>Third paragraph of part 1</p>
<p class='subtitle'>Subtitle 2</p>
<p class='content'>First paragraph of part 2</p>
<p class='content'>Second paragraph of part 2</p>
<p class='subtitle'>Subtitle 3 
[and so on…]

And I would like to turn this into :

<div n='section1'>
    <head>Subtitle 1</head>
    <p>First paragraph of part 1, with some <span>Inside</span> and other and other 
     nested elements, on multiple levels</p>
    <ul>a list with <li> inside</ul>
    <p>Second paragraph of part 1</p>
    <picture source='xyz.jpg'/>
    <p>Third paragraph of part 1</p>
</div>
<div n="section2">
    <head>Subtitle 2</head>
    <p>First paragraph of part 2</p>
    <p>Second paragraph of part 2</p>
</div>
<div n="Section 3">
    <head>Subtitle 3</head>
    [and so on…]

I cannot find my way around this issue. Also, if a first step would only unflatten the HTML file (strictly copying the elements inside the div without transformation), this would already be amazing.

THANKS in advance!


Solution

  • This is a classic positional grouping problem. To get you started:

    <xsl:template match="body">
      <body>
        <xsl:for-each-group select="*" group-starting-with="p[@class='subtitle']">
          <xsl:choose>
            <xsl:when test="@class="subtitle">
              <div n="section{position()}">
                <head>{.}</head>
                <xsl:apply-templates select="tail(current-group())"/>
              </div>
            </xsl:when>
            <xsl:otherwise>
               <xsl:apply-templates select="current-group()"/>
            </xsl:otherwise>
          </xsl:choose>
        </xsl:for-each-group>
      </body>
    </xsl:template>
    

    Note that xsl:for-each-group requires XSLT 2.0 or later. It's considerably more difficult with XSLT 1.0.