Search code examples
xslt-2.0saxon

Saxon XSLT2.0 Extracting Numbers from the String


I am trying to Extract Integer from a String using Xslt2.0 For Example consider the string "designa80000dd5424d" and i need the two integers inside the string i.e "8000" and "5424"

I tried using translate function as below

select="translate($term,translate($term, '0123456789', ''), '')"

But it combines both the integers and gives the output as "80005424" i need something which separates them


Solution

  • I tried using translate function as below

    select="translate($term,translate($term, '0123456789', ''), '')"

    But it combines both the numbers and gives the output as "80005424" i need something which separates them

    I. Here is a complete XSLT 1.0 solution:

    <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
     <xsl:output method="text"/>
    
      <xsl:template match="/*">
        <xsl:variable name="vSpaces">
          <xsl:call-template name="makeSpaces"/>
        </xsl:variable>
    
        <xsl:variable name="vtheNumbers" 
             select="normalize-space(translate(., translate(.,'0123456789',''), $vSpaces))"/>
    
        <xsl:call-template name="tokenize">
          <xsl:with-param name="pStr" select="$vtheNumbers"/>
        </xsl:call-template>
      </xsl:template>
    
      <xsl:template name="tokenize">
        <xsl:param name="pStr"/>
        <xsl:param name="pInd" select="1"/>
    
        <xsl:if test="string-length($pStr)">
          <xsl:value-of select=
               "concat($pInd, ': ',substring-before(concat($pStr, ' '), ' '), '&#xA;')"/>
    
          <xsl:call-template name="tokenize">
            <xsl:with-param name="pStr" select="substring-after($pStr, ' ')"/>
            <xsl:with-param name="pInd" select="$pInd +1"/>
          </xsl:call-template>
        </xsl:if>
      </xsl:template>
    
      <xsl:template name="makeSpaces">
        <xsl:param name="pLen" select="string-length(.)"/>
    
        <xsl:choose>
          <xsl:when test="$pLen = 1">
            <xsl:value-of select="' '"/>
          </xsl:when>
          <xsl:when test="$pLen > 1">
            <xsl:variable name="vHalfLen" select="floor($pLen div 2)"/>
    
            <xsl:call-template name="makeSpaces">
              <xsl:with-param name="pLen" select="$vHalfLen"/>
            </xsl:call-template>
            <xsl:call-template name="makeSpaces">
              <xsl:with-param name="pLen" select="$pLen -$vHalfLen"/>
            </xsl:call-template>
          </xsl:when>
        </xsl:choose>
      </xsl:template>
    </xsl:stylesheet>
    

    When this transformation is applied on the following XML document:

    <t>designa80000dd5424dan1733g122</t>
    

    the wanted, correct result is produced:

    1: 80000
    2: 5424
    3: 1733
    4: 122
    

    Do note:

    The last argument of the outer translate() is a string having the same number of characters as that of the input string, and each of these characters is a space.


    II. XPath 2.0 shorter and simpler

    This XPath 2.0 expression when evaluated produces the wanted sequence of numbers:

    tokenize(., '[^\d]+')[.]
    

    Here is an XSLT - based verification:

    <xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
     <xsl:output omit-xml-declaration="yes" indent="yes"/>
    
      <xsl:template match="/*">
        <xsl:variable name="vNumbers" 
        select="tokenize(., '[^\d]+')[.]"/>
    
        <xsl:for-each select="$vNumbers">
          <xsl:value-of select="concat(position(), ': ', ., '&#xA;')"/>
        </xsl:for-each>
      </xsl:template>
    </xsl:stylesheet>
    

    When this transformation is applied on the same XML document:

    <t>designa80000dd5424dan1733g122</t>
    

    the same correct result is produced:

    1: 80000
    2: 5424
    3: 1733
    4: 122