Search code examples
xmlxsltxslt-1.0groupingbucket

Use XSLT 1.0 to group XML elements into buckets, in order, based on some criteria


Say I had some XML that I wanted to convert to HTML. The XML is divided into ordered sections:

<?xml version="1.0" encoding="utf-8"?>
<root>
  <section attr="someCriteria">
    <h1>Title 1</h1>
    <p>paragraph 1-1</p>
    <p>paragraph 1-2</p>
  </section>
  <section attr="someOtherCriteria">
    <h3>Subtitle 2</h3>
    <ul>
      <li>list item 2-1</li>
      <li>list item 2-2</li>
      <li>list item 2-3</li>
      <li>list item 2-4</li>
    </ul>
  </section>
  <section attr="anotherSetOfCriteria">
    <warning>
      Warning: This product could kill you
    </warning>
  </section>
  <section attr="evenMoreCriteria">
    <disclaimer>
      You were warned
    </disclaimer>
  </section>
  <section attr="criteriaSupreme">
    <p>Copyright 1999-2011</p>
  </section>
</root>

I have several of these XML documents. I need to group and transform these sections based on criteria. There will be two different kinds of buckets.

  • So the first section will go in a bucket (e.g.<div class="FormatOne"></div>)
  • If the second section meets the criteria to qualify for the "FormatOne" bucket it will also go in this bucket
  • If the third section requires a different bucket (e.g.<div class="FormatTwo"></div>) then a new bucket is created and section contents are placed in this bucket
  • If the bucket for the fourth section requires "FormatOne" (which is different than the previous format) then a new bucket is created again and section contents are placed in this bucket
  • etc. Each section would go into the same bucket as the previous section if they are the same format. If not, a new bucket is created.

So for each document, depending on the logic for separating buckets, the document may end up like this:

<body>
  <div class="FormatOne">
    <h1>Title 1</h1>
    <p>paragraph 1-1</p>
    <p>paragraph 1-2</p>
    <h3>Subtitle 2</h3>
    <ul>
      <li>list item 2-1</li>
      <li>list item 2-2</li>
      <li>list item 2-3</li>
      <li>list item 2-4</li>
    </ul>
  </div>
  <div class="FormatTwo">
    <span class="warningText">
      Warning: This product could kill you
    </span>
  </div>
  <div class="FormatOne">
    <span class="disclaimerText"> You were warned</span>
    <p class="copyright">Copyright 1999-2011</p>
  </div>
</body>

this:

<body>
  <div class="FormatOne">
    <h1>Title 1</h1>
    <p>paragraph 1-1</p>
    <p>paragraph 1-2</p>
    <h3>Subtitle 2</h3>
  </div>
  <div class="FormatTwo">
    <ul>
      <li>list item 2-1</li>
      <li>list item 2-2</li>
      <li>list item 2-3</li>
      <li>list item 2-4</li>
    </ul>
  </div>
  <div class="FormatOne">
    <span class="warningText">
      Warning: This product could kill you
    </span>
    <span class="disclaimerText"> You were warned</span>
    <p class="copyright">Copyright 1999-2011</p>
  </div>
</body>

or even this:

<body>
  <div class="FormatOne">
    <h1>Title 1</h1>
    <p>paragraph 1-1</p>
    <p>paragraph 1-2</p>
    <h3>Subtitle 2</h3>
    <ul>
      <li>list item 2-1</li>
      <li>list item 2-2</li>
      <li>list item 2-3</li>
      <li>list item 2-4</li>
    </ul>
    <span class="warningText">
      Warning: This product could kill you
    </span>
    <span class="disclaimerText"> You were warned</span>
    <p class="copyright">Copyright 1999-2011</p>
  </div>
</body>

depending on how the sections are defined.

Is there a way to use an XSLT to perform this type of grouping magic?

Any help would be great. Thanks!


Solution

  • I came up with a solution that involves hitting each section sequentially. The processing of each section is broken into two parts: a "shell" and a "contents" portion. The "shell" is responsible for rendering the <div class="FormatOne">...</div> bits, and the "contents" is responsible for rendering the actual contents of the current section and all following sections until a non-matching section is found.

    When a non-matching section is found, control reverts to the "shell" template for that section.

    This gives an interesting bit of flexibility: the "shell" templates may be very aggressive in what they match, and the "contents" sections may be more discerning. Specifically, with your first example output, you need the warning element to appear as <span class="warningText">...</span>, and this is accomplished with a more closely matching template.

    All "content" templates, after rendering the contents of their current section, call a named template that looks for the "next" appropriate content section. This helps consolidate the rules for determining what qualifies as a "matching" section.

    You can see a working example here.

    Here is my code, built to replicate what you asked for in your first example:

    <?xml version="1.0" encoding="utf-8"?>
    <xsl:stylesheet version="1.0"
        xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
        <xsl:output method="xml" />
    
        <xsl:template match="/">
            <body>
                <xsl:apply-templates select="/root/section[1]" mode="shell" />
            </body>
        </xsl:template>
    
        <xsl:template match="section[
            @attr = 'someCriteria' or
            @attr = 'someOtherCriteria' or
            @attr = 'evenMoreCriteria' or
            @attr = 'criteriaSupreme']" mode="shell">
    
            <div class="FormatOne">
                <xsl:apply-templates select="." mode="contents" />
            </div>
    
            <xsl:apply-templates select="following-sibling::section[
                @attr != 'someCritera' and
                @attr != 'someOtherCriteria' and
                @attr != 'evenMoreCriteria' and
                @attr != 'criteriaSupreme'][1]" mode="shell" />
    
        </xsl:template>
    
        <xsl:template name="nextFormatOne">
            <xsl:variable name="next" select="following-sibling::section[1]" />
            <xsl:if test="$next[
                @attr = 'someCriteria' or
                @attr = 'someOtherCriteria' or
                @attr = 'evenMoreCriteria' or
                @attr = 'criteriaSupreme']">
                <xsl:apply-templates select="$next" mode="contents" />
            </xsl:if>
        </xsl:template>
    
        <xsl:template match="section[
            @attr = 'someCriteria' or
            @attr = 'someOtherCriteria']" mode="contents">
    
            <xsl:copy-of select="*" />
    
            <xsl:call-template name="nextFormatOne" />
        </xsl:template>
    
        <xsl:template match="section[@attr = 'evenMoreCriteria']" mode="contents">
            <span class="disclaimerText">
                <xsl:value-of select="disclaimer" />
            </span>
    
            <xsl:call-template name="nextFormatOne" />
        </xsl:template>
    
        <xsl:template match="section[@attr = 'criteriaSupreme']" mode="contents">
            <p class="copyright">
                <xsl:value-of select="p" />
            </p>
    
            <xsl:call-template name="nextFormatOne" />
        </xsl:template>
    
        <xsl:template match="section[@attr = 'anotherSetOfCriteria']" mode="shell">
            <div class="FormatTwo">
                <xsl:apply-templates select="." mode="contents" />
            </div>
            <xsl:apply-templates select="
                following-sibling::section[@attr != 'anotherSetOfCriteria'][1]"
                mode="shell" />
        </xsl:template>
    
        <xsl:template name="nextFormatTwo">
            <xsl:variable name="next" select="following-sibling::section[1]" />
            <xsl:if test="$next[@attr = 'anotherSetOfCriteria']">
                <xsl:apply-templates select="$next" mode="contents" />
            </xsl:if>
        </xsl:template>
    
        <xsl:template
            match="section[@attr = 'anotherSetOfCriteria']"
            mode="contents">
    
            <span class="warningText">
                <xsl:value-of select="warning" />
            </span>
    
            <xsl:call-template name="nextFormatTwo" />
        </xsl:template>
    
    </xsl:stylesheet>