Search code examples
javaxsltsplititerationtokenize

XSL 1.0, How to split string with taking care about not slicing words


I have to improve splitting long strings in XSL. The line size is 60 characters. When there appears quite a long string, it is splitting into lines in so inelegant way. I try to implement the mechanism of taking care of spaces, to avoid slicing words in the middle of them.

Now, the code looks like that:

<xsl:template name="split_text">    
       <xsl:param name="sText"/>
       <xsl:param name="lineSize">60</xsl:param>
    
       <xsl:variable name="toDisplay" saxon:assignable="yes"/>
       <xsl:variable name="toProcess" saxon:assignable="yes" select="$sText"/>

       <saxon:while test="string-length($toProcess) > $lineSize">
          <saxon:assign name="toDisplay" select="substring($toProcess, 1, $lineSize)"/>
          <saxon:assign name="toProcess" select="substring($toProcess, $lineSize + 1)"/>
          <xsl:value-of select="$toDisplay"/><br/>
       </saxon:while>
       <xsl:value-of select="$toProcess"/>

    </xsl:template>

It's just split text if it is longer than line capacity. I want to handle cases when line capacity ends in the middle of some words. I read about tokenizers, substring-before-last, etc. But I got some exceptions in java. Probably I am working on too old XSL version, but it is not impossible to upgrade it, so I have to use what I have.

I am afraid of depending on the last occurrence of space char in every line because the input can be a long char sequence without any spaces, and then the best option will be still using code which I pasted upside. Is it in XSL some simple way, to tokenize?

Should I tokenize full string and append every next token as long as their summary length is smaller than line capacity? Or maybe should I check if the last character in line is space char, or not, and then make some additional operations?

I am so confused, it is my first date with XSL.

ADDITIONAL EDIT: I found interesting for me function saxon:tokenize. Description in documentation sounds great - this is what I need. But it is possible to use in XSL 1.0 and Saxon - here paste from Manifest:

Manifest-Version: 1.0
Main-Class: com.icl.saxon.StyleSheet
Created-By: 1.3.1_16 (Sun Microsystems Inc.)
```.

If yes, how to iterate over that? I found on the web some various styles of iterating and I don't know and don't understand what differences, pros, and cons are between they

Solution

  • Okay, I have done it, so I will share my solution, maybe somebody will have similar problem.

    <xsl:template name="split_text">    
           <xsl:param name="sText"/>
           <xsl:param name="lineSize">60</xsl:param>
        
           <xsl:variable name="remainder" saxon:assignable="yes"/>
           <xsl:variable name="textTokens" saxon:assignable="yes" select="saxon:tokenize($sText)" />
    
                <xsl:choose>
                <!-- If line length is fill, then it is printed and remainder is cleared -->
                    <xsl:when test="(string-length($remainder) >= $lineSize)">
                        <xsl:value-of select="$remainder"/><br/>
                        <saxon:assign name="remainder" select="''"/>                
                    </xsl:when>
                    <!-- Words are sequentially adding to line until it become filled -->
                    <xsl:otherwise>
                        <saxon:assign name="remainder" select="concat($remainder, ' ', $currentToken, ' ')"/>
                    </xsl:otherwise>
                </xsl:choose>           
            </xsl:for-each>
        </xsl:template>
    

    I used saxon's tokenize, and start to iterate over list of tokens, checking line length after every loop.