Search code examples
xsltxslt-2.0

Group the text nodes as well as element nodes based on few starting text


Please suggest to make group the text() + element node based on few text formats like (Fig.|Figs.|Figure|Table|Tables). If these citations text starts with and ends-with parenthesis like (,[,{,),],} signs, grouping should enclose the parenthesis too, otherwise Fig|Table word + Xref element(s) to be grouped within <col1>***</col1>.

These grouping should applicable any text() nodes except under 'Refs' element.

Input:

<root>
    <Para>The citations are like (Fig. <xref refID="f1">1</xref>).</Para>
    <Para>The <b>citations are like (Fig. <xref refID="f1">1</xref>).</b></Para>
    <Extract>The citations are like (Figs. <xref refID="f1">1</xref> and <xref refID="f2">2</xref>).</Extract>
    <DispQuote>The citations are like (Tables <xref refID="t1">1</xref> and <xref refID="t2">2</xref>).</DispQuote>
    <Para1>The citations are like (Tables <xref refID="t1">1</xref> and <xref refID="t2">2</xref>; Fig. <xref refID="f1">1</xref>).</Para1>
    <Para2>The citations are like (analysation of Fig. <xref refID="f1">1</xref>).</Para2>
    <Para>The citations are like (explained in Figs. <xref refID="f1">1</xref> and <xref refID="f2">2</xref>).</Para>
    <Para>The citations are like (Chapter 1 and 3 are explained in Tables <xref refID="t1">1</xref> and <xref refID="t2">2</xref>).</Para>
    <Refs>The citations are like (Fig. <xref refID="f1">1</xref>).</Refs>
</root>

XSLT2:

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="@*|node()">
    <xsl:copy><xsl:apply-templates select="@*|node()"/></xsl:copy>
</xsl:template>
<xsl:template match="Para">
    <xsl:copy><xsl:call-template name="tempCrossRef1"/></xsl:copy>
</xsl:template>

<xsl:template name="tempCrossRef1">
    <!--xsl:analyze-string select="." regex="\([ ]+)|([\+])|([=])|([%])|([/])|([\[])|([\]])"-->
    <!-- (Fig. <xref refID="f1">1</xref>) -->
    <!--xsl:analyze-string select="node()" regex="\(Fig. ">
        <xsl:matching-substring>
            <xsl:choose>
                <xsl:when test="following-sibling::node()[2][parent::*/name()='xref']">
                    <col><xsl:apply-templates select="."/></col>
                </xsl:when>
                <xsl:otherwise><xsl:apply-templates select="."/></xsl:otherwise>
            </xsl:choose>
        </xsl:matching-substring>
        <xsl:non-matching-substring>
            <xsl:value-of select="."/>
        </xsl:non-matching-substring>
    </xsl:analyze-string-->
    <xsl:for-each select="node()">
        <xsl:choose>
            <xsl:when test="ends-with(., 'Fig.')">
                <xsl:for-each-group select="self::node()[ends-with(., 'Fig.')]" group-adjacent="boolean(self::xref)">
                    <xsl:choose>
                        <xsl:when test="current-grouping-key()">
                            <xsl:apply-templates select="current-group()" />
                        </xsl:when>
                        <xsl:otherwise>
                            <p1>
                                <xsl:apply-templates select="current-group()" />
                            </p1>
                        </xsl:otherwise>
                        </xsl:choose>
                </xsl:for-each-group>
            </xsl:when>
            <xsl:otherwise>
                <xsl:copy><xsl:apply-templates select="@*|node()"/></xsl:copy>
            </xsl:otherwise>
        </xsl:choose>
    </xsl:for-each>
</xsl:template>

<xsl:template match="xref">
    <xref>
        <xsl:apply-templates select="@*"/>
        <xsl:apply-templates />
    </xref>
</xsl:template>
</xsl:stylesheet>

Required Result:

<root>
    <Para>The citations are like <col1>(Fig. <xref refID="f1">1</xref>)</col1>.</Para>
    <Para>The <b>citations are like <col1>(Fig. <xref refID="f1">1</xref>)</col1>.</b></Para>
    <Para>The citations are like <col1>(Fig. <xref refID="f1">1</xref>)</col1>.</Para>
    <Extract>The citations are like <col1>(Figs. <xref refID="f1">1</xref> and <xref refID="f2">2</xref>)</col1>.</Extract>
    <DispQuote>The citations are like <col1>(Tables <xref refID="t1">1</xref> and <xref refID="t2">2</xref>)</col1>.</DispQuote>
    <Para1>The citations are like <col1>(Tables <xref refID="t1">1</xref> and <xref refID="t2">2</xref>; Fig. <xref refID="f1">1</xref>)</col1>.</Para1>
    <Para2>The citations are like (analysation of <col1>Fig. <xref refID="f1">1</xref></col1>).</Para2>
    <Para>The citations are like (explained in <col1>Figs. <xref refID="f1">1</xref> and <xref refID="f2">2</xref></col1>).</Para>
    <Para>The citations are like (Chapter 1 and 3 are explained in <col1>Tables <xref refID="t1">1</xref> and <xref refID="t2">2</xref></col1>).</Para>
    <Refs>The citations are like (Fig. <xref refID="f1">1</xref>).</Refs><!-- Within this element, grouping not required-->
</root>

Solution

  • Here is an attempt using two steps, the first transforms any of the patterns [(]?(Fig\.|Figs\.|Figure|Table[s]?) into start elements and the end patterns [)] into end elements, the second steps then tries to use group-starting-with/ending-with to wrap such content into col1:

    <?xml version="1.0" encoding="UTF-8"?>
    <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
        xmlns:xs="http://www.w3.org/2001/XMLSchema"
        exclude-result-prefixes="xs"
        version="3.0">
    
      <xsl:param name="start-patterns" as="xs:string">[(]?(Fig\.|Figs\.|Figure|Table[s]?)</xsl:param>
      <xsl:param name="end-patterns" as="xs:string">[)]</xsl:param>
    
      <xsl:mode on-no-match="shallow-copy"/>
    
      <xsl:mode name="text-to-elements" on-no-match="shallow-copy"/>
    
      <xsl:template match="root/*[not(self::Refs)][matches(., $start-patterns)]">
          <xsl:copy>
              <xsl:variable name="text-to-elements" as="node()*">
                  <xsl:apply-templates mode="text-to-elements"/>
              </xsl:variable>
              <xsl:for-each-group select="$text-to-elements" group-starting-with="start">
                  <xsl:choose>
                      <xsl:when test="self::start">
                          <xsl:for-each-group select="current-group()" group-ending-with="end">
                              <xsl:choose>
                                  <xsl:when test="current-group()[last()][self::end]">
                                      <col1>
                                          <xsl:apply-templates select="current-group()"/>
                                      </col1>
                                  </xsl:when>
                                  <xsl:otherwise>
                                      <xsl:apply-templates select="current-group()"/>
                                  </xsl:otherwise>
                              </xsl:choose>
                          </xsl:for-each-group>                      
                      </xsl:when>
                      <xsl:otherwise>
                          <xsl:apply-templates select="current-group()"/>
                      </xsl:otherwise>
                  </xsl:choose>
              </xsl:for-each-group>
          </xsl:copy>
      </xsl:template>
    
      <xsl:template match="start | end">
          <xsl:apply-templates/>
      </xsl:template>
    
      <xsl:template match="text()" mode="text-to-elements">
          <xsl:analyze-string select="." regex="{$start-patterns}">
              <xsl:matching-substring>
                  <start>
                      <xsl:value-of select="."/>
                  </start>
              </xsl:matching-substring>
              <xsl:non-matching-substring>
                  <xsl:analyze-string select="." regex="{$end-patterns}">
                      <xsl:matching-substring>
                          <end>
                              <xsl:value-of select="."/>
                          </end>                      
                      </xsl:matching-substring>
                      <xsl:non-matching-substring>
                          <xsl:value-of select="."/>
                      </xsl:non-matching-substring>
                  </xsl:analyze-string>
              </xsl:non-matching-substring>
          </xsl:analyze-string>
      </xsl:template>
    
    </xsl:stylesheet>
    

    As you can see at https://xsltfiddle.liberty-development.net/pPgCcow, this approach seems to produce the wanted result for your posted input, except the for element

    <Para1>The citations are like (Tables <xref refID="t1">1</xref> and <xref refID="t2">2</xref>; Fig. <xref refID="f1">1</xref>).</Para1>