Search code examples
xmlstringxsltchunking

ampersand entities in long strings make template malfunction


I'm using a template I found on the internet to split long strings into chunks. It seems to work fine with most text but if there is an entity in this supposedly long text input such as '&' it makes the output chunk too long.

My consumer of this data needs long description values in certain entities broken down into fixed length pieces. So this data is fine:

<tag>
    <text>This is a long string 1This is a long string 2This is a long string 3This is a long string 4</text>
</tag>

resulting output:

<?xml version="1.0" encoding="UTF-8"?>
<tag>
    <text>
      <text>This is a long string 1</text>
      <text>This is a long string 2</text>
      <text>This is a long string 3</text>
      <text>This is a long string 4</text>
   </text>
</tag>

this data is too long here in the first output string:

<tag>
    <text>&amp;This is a long string 1This is a long string 2This is a long string 3This is a long string 4</text>
</tag>

resulting output:

<?xml version="1.0" encoding="UTF-8"?>
<tag>
    <text>
      <text>&amp;This is a long string </text>
      <text>1This is a long string </text>
      <text>2This is a long string </text>
      <text>3This is a long string </text>
      <text>4</text>
   </text>
</tag>

I tried changing the output to html but that didn't change the behavior - anyways the output is supposed to by xml.

I'm actually not sure if the problem is really solvable as the XML is only the middle man and the actual source and destination ultimately are plain text database field, but I'd like to chunk the long string into short strings exactly the desired length.

Here is the template.. here the desired size is: 23

<?xml version='1.0'?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:max="http://www.ibm.com/maximo" exclude-result-prefixes="max">
<xsl:output method="xml" encoding="utf-8" indent="yes"/>
<xsl:variable name="pChunkSize" select="23" />

<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>

<xsl:template match="text/text()" name="chunk">
<xsl:param name="pText" select="."/>

<xsl:if test="string-length($pText) >0">
<text>
<xsl:value-of select=
   "substring($pText, 1, $pChunkSize)"/>
</text>
<xsl:call-template name="chunk">
<xsl:with-param name="pText"
    select="substring($pText, $pChunkSize+1)"/>
</xsl:call-template>
</xsl:if>
</xsl:template>
</xsl:stylesheet>

I have seen some discussion about this and it might be intractable - since chopping & in the middle (if it was towards the end of the segment) results in invalid XML.


Solution

  • You have correctly divided the text into four pieces each of length 23 characters, and have correctly represented each of those 23-character strings in its proper XML representation.

    If they can't handle it, it must be because they aren't processing the XML correctly using a conformant XML parser, so the problem is at their end, not at yours.