Search code examples
xsltxslt-2.0xslt-3.0

How to remove the first occurrence of a word with XSLT?


Given the following XML document:

<books>
  <book>
   <name>The problem of the ages</name>
  </book>
  <book>
   <name>Filtering the tap</name>
  </book>
  <book>
   <name>Legend of Atlantis</name>
  </book>
</books>

I want to remove the first "the" from the name of each book. Example of output:

<library>
  <record>problem of the ages</record>
  <record>Filtering tap</record>
  <record>Legend of Atlantis</record>
</library>

How would I achieve this using a single XSLT?


Solution

  • It is difficult to decide what a word is, given the different languages that exist in a world. However, the regular expression language used in XSLT/XPath 2 and later allows you to match on \w alphanumeric letters so

    <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
        xmlns:xs="http://www.w3.org/2001/XMLSchema"
        xmlns:fn="http://www.w3.org/2005/xpath-functions"
        exclude-result-prefixes="#all"
        version="3.0">
        
      <xsl:param name="word-to-eliminate" as="xs:string" select="'the'"/>
      
      <xsl:output indent="yes"/>
    
      <xsl:mode on-no-match="shallow-copy"/>
    
      <xsl:template match="book/name">
          <xsl:copy>
              <xsl:apply-templates select="analyze-string(., $word-to-eliminate, 'i')" mode="eliminate-first"/>
          </xsl:copy>
      </xsl:template>
      
      <xsl:template match="fn:match[1]" mode="eliminate-first"/>
      
      <xsl:template match="fn:non-match[preceding-sibling::node()[1][. is root()/descendant::fn:match[1]]]" mode="eliminate-first">
          <xsl:value-of select="replace(., '^\s', '')"/>
      </xsl:template>
    
    </xsl:stylesheet>
    

    might help in XSLT 3 or could be achieve in a similar way in XSLT 2 using xsl:analyze-string instead.

    Or, if any white space can be considered a word separator and in the result you only want a single space between the remaining words, then

    <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
        xmlns:xs="http://www.w3.org/2001/XMLSchema"
        xmlns:fn="http://www.w3.org/2005/xpath-functions"
        exclude-result-prefixes="#all"
        version="3.0">
        
      <xsl:param name="word-to-eliminate" as="xs:string" select="'the'"/>
      
      <xsl:output indent="yes"/>
    
      <xsl:mode on-no-match="shallow-copy"/>
    
      <xsl:template match="book">
          <record>
              <xsl:value-of 
                select="analyze-string(name, '\s+')
                        !
                        (fn:non-match 
                         except 
                         fn:non-match[lower-case(.) = $word-to-eliminate][1]
                        )"/>
          </record>
      </xsl:template>
      
    </xsl:stylesheet>