Search code examples
xmlxsltxslt-grouping

XSLT : Problem with complex grouping


I'm struggling with getting 'for-each-group' working, I recently switched to xslt 2 but have still some job to do to get it all understood. I'm trying to clean out some files received from Framemaker MIF (flat xml), and while in most cases the data is pretty clean it are the exceptions that drive me nuts. I've combined some typical examples in below xml. Example I use is related to the underline tag, in principle the files are build as follows : If you see a [Underline/] tag all following siblings need to be underlined until you reach the [EndUnderline/] tag, so my aim is to get rid of both these tags, and encapsulate all siblings inbetween in a single [u] tag. Problem however is that there can be subsequent [Underline/] tags that need to be ignored up until the actual [EndUnderline/] tag is reached.

Let's try to make above more visible, this is a simplified XML file :

<TestFile>
<!-- Para tag containing no underline tags -->
 <Para>
  <Content>[text_not_underlined]</Content>
 </Para>

<!-- correct encapsulation from source -->
<Para>
 <Content>
  <Underline/>[text_to_be_underlined]<EndUnderline/>
  <p>Some test data</p>
 </Content>
</Para>

<!-- extra underline tag that should be ignored -->
<Para>
 <Content>
  <Underline/>[text_to_be_underlined]
  <Underline/>
  <EndUnderline/>
  <p>Some other test data</p>
 </Content>
</Para>

<!-- some extra end underline tags that should be ignored -->
<Para>
 <Content>
  <EndUnderline/>[no_longer_underline]<EndUnderline/>
  <p>: More data</p>
 </Content>
</Para>

</TestFile> 

This is where I got till now with my xslt :

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>

<xsl:template match="/">
 <xsl:copy>
  <xsl:apply-templates select="@*|node()"/>
 </xsl:copy>
</xsl:template>

<xsl:template match="@*|node()">
 <xsl:copy>
  <xsl:apply-templates select="@*|node()"/>
 </xsl:copy>
</xsl:template>

<xsl:template match="Content">
 <xsl:copy>
  <xsl:for-each-group select="node()" group-ending-with="EndUnderline">
   <xsl:choose>
    <xsl:when test="current-grouping-key()">
     <xsl:variable name="start" select="current-group()[self::Underline][1]"/>
      <xsl:copy-of select="current-group()[$start >> .]"/>
       <u>
        <xsl:copy-of select="current-group()[. >> $start][not(self::Underline)][not(self::EndUnderline)]"/>
       </u>
      </xsl:when>
     <xsl:otherwise>
    <xsl:copy-of select="current-group()"/>
   </xsl:otherwise>
  </xsl:choose>
 </xsl:for-each-group>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>

And this is the result :

<TestFile>

<!-- Para tag containing no underline tags -->
<Para>
 <Content>
  <u/>
 </Content>
</Para>

<!-- correct encapsulation from source -->
<Para>
 <Content>
  <u>[text_to_be_underlined]</u>
  <u/>
 </Content>
</Para>

<!-- extra underline tag that should be ignored -->
<Para>
 <Content>
  <u>[text_to_be_underlined]</u>
  <u/>
 </Content>
</Para>

<!-- some extra end underline tags that should be ignored -->
<Para>
 <Content>
  <u/>
  <u/>
 </Content>
</Para>
</TestFile>

While this is what I'm aiming for :

<TestFile>
 <!-- Para tag containing no underline tags -->
 <Para>
  <Content>[text_not_underlined]</Content>
 </Para>

<!-- correct encapsulation from source -->
<Para>
 <Content>
  <u>[text_to_be_underlined]</u>
  <p>Some test data</p>
 </Content>
</Para>

<!-- extra underline tag that should be ignored -->
<Para>
 <Content>
  <u>[text_to_be_underlined]</u>
  <p>Some other test data</p>
 </Content>
</Para>
<!-- some extra end underline tags that should be ignored -->
<Para>
 <Content>
  [no_longer_underline]
  <p>: More data</p>
 </Content>
</Para>
</TestFile>

Thanks in advance for any tip that can point me in the right direction !


Solution

  • Thanks, but this would actually only work if there is a single element between the start tag and the end tag I assume.

    Anyway, I found an answer in the meantime thanks to some other helpful internet folks so let me share what we came up with in the end :

            <xsl:template match="Content">
        <xsl:copy>
            <xsl:for-each-group select="node()" group-ending-with="EndUnderline">
                <xsl:variable name="start" select="current-group()[self::Underline][1]"/>
                <xsl:choose>
                    <xsl:when test="$start">
                        <!-- Content element contains at least one <Underline/> marker element, so we group all between the first <Underline/> tag until the first <EndUnderline/> tag -->
                        <xsl:apply-templates select="current-group()[$start >> .]"/>
                        <!-- Every tag before the first <Underline/> marker gets transformed as standard, all tags between the markers gets encapsulated in a <u> tag -->
                        <u>
                            <xsl:apply-templates select="current-group()[. >> $start][not(self::Underline)][not(self::EndUnderline)]"/>
                        </u>
                    </xsl:when>
                    <xsl:otherwise>
                        <!-- Apply standard transformation on current group (not containing underline tags...) -->
                        <xsl:apply-templates select="current-group()"/>
                    </xsl:otherwise>
                </xsl:choose>
            </xsl:for-each-group>
        </xsl:copy>
    </xsl:template>
    <!-- Get rif of standalone end tags... -->
    <xsl:template match="EndUnderline"/>