Search code examples
numbersxslt-1.0exslt

How to find all numbers in a string with XSLT 1


There are some nice solutions to How to find all numbers in a string for XSLT 2 and even 3. How can I accomplish the exact same thing within the limits of XSLT 1 (withe the possible help of EXSLT)?

Here’s an example:

<data>
  <sig>NL Mellin 1-1 36</sig>
  <sig>NL Mellin 1-1 38</sig>
  <sig>NL Mellin 1-10 02</sig>
  <sig>NL Mellin 1-10 04</sig>
  <sig>NL Mellin 1-10 09</sig>
</data>

The desired output would be:

1 1 36
1 1 38
1 10 02
1 10 04
1 10 09

Solution

  • Try it this way:

    XSLT 1.0

    <xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="text" encoding="utf-8" />
    
    <xsl:template match="/">
        <xsl:for-each select="data/sig">
            <xsl:call-template name="tokenize">
                <xsl:with-param name="text" select="translate(., '-', ' ')"/>
            </xsl:call-template>
            <xsl:if test="position()!=last()">
                <xsl:text>&#10;</xsl:text>
            </xsl:if>
        </xsl:for-each>
    </xsl:template>
    
    <xsl:template name="tokenize">
        <xsl:param name="text"/>
        <xsl:param name="delimiter" select="' '"/>
            <xsl:variable name="token" select="substring-before(concat($text, $delimiter), $delimiter)" />
            <xsl:if test="$token = translate($token, translate($token, '0123456789', ''), '')">
                    <xsl:value-of select="$token"/>
                    <xsl:text> </xsl:text>
            </xsl:if>
            <xsl:if test="contains($text, $delimiter)">
                <!-- recursive call -->
                <xsl:call-template name="tokenize">
                    <xsl:with-param name="text" select="substring-after($text, $delimiter)"/>
                </xsl:call-template>
            </xsl:if>
    </xsl:template>
    
    </xsl:stylesheet>
    

    Note:

    1. If you have multiple delimiters, you need to translate them to a common character (space in my example);

    2. I didn't bother to remove the trailing space in each line;

    3. If your processor supports the EXSLT str:tokenize() function, this could be simpler.