Search code examples
xmlxsltescapingxslt-1.0

Get escaped " (&quote;) and ' (') chars in xml output after xsl transformation


I need to escape " and ' in an xml file to " ' using XSLT 1.0 due to that the consuming system cannot handle these characters inside an xml element.

I've run into two issues that I haven't solved.

  1. Being able to match ' at all.
  2. Getting my " to stay escaped (while not ruining the whole string).

I'm now believing this cannot be done. If I'm wrong, please let me know.

XML file:

<?xml version="1.0" encoding="UTF-8"?>
<foo>
   <bar>
      <att name="name">Let's fix these "errors".</att>
   </bar>
</foo>

Wanted output would be something like this:

<?xml version="1.0" encoding="UTF-8"?>
<foo>
   <bar>
      <att name="name">Let&apos;s fix these &quot;errors&quot;.</att>
   </bar>
</foo>

XSL:

<?xml version="1.0" encoding="UTF-8"?>    
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

    <xsl:template match="att[@name ='name']">
        <xsl:param name="content" select="text()" />
        <xsl:variable name="replaceApos">
            <xsl:call-template name="string-replace-all">
                <xsl:with-param name="text" select="$content" />
                <xsl:with-param name="replace" select="'&#39;'" />
                <xsl:with-param name="by" select="'&apos;'" />
            </xsl:call-template>
        </xsl:variable>
        <xsl:variable name="replaceQuot">
            <xsl:call-template name="string-replace-all">
                <xsl:with-param name="text" select="$replaceApos" />
                <xsl:with-param name="replace" select="'&quot;'" />
                <xsl:with-param name="by" select="'&amp;quot;'" />
            </xsl:call-template>
        </xsl:variable>

        <att name="name"><xsl:value-of select="$replaceQuot"/></att>
    </xsl:template>
    
    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()" />
        </xsl:copy>
    </xsl:template>
   
    <xsl:template name="string-replace-all">
        <xsl:param name="text" />
        <xsl:param name="replace" />
        <xsl:param name="by" />
        <xsl:choose>
           <xsl:when test="$text = '' or $replace = ''or not($replace)" > 
                <!-- Prevent this routine from hanging -->
                       <xsl:value-of select="$text" />
            </xsl:when> 
            <xsl:when test="contains($text, $replace)">
                <xsl:value-of select="substring-before($text,$replace)" />
                <xsl:value-of select="$by" />
                <xsl:call-template name="string-replace-all">
                    <xsl:with-param name="text" select="substring-after($text,$replace)" />
                    <xsl:with-param name="replace" select="$replace" />
                    <xsl:with-param name="by" select="$by" />
                </xsl:call-template>
            </xsl:when>
            <xsl:otherwise>
                <xsl:value-of select="$text" />
            </xsl:otherwise>
        </xsl:choose>
    </xsl:template> 
</xsl:stylesheet>

(note: above XSL won't validate with '&#39; or &apos;)

Q1. Regardless if I use &apos; &#39; or &#x0027; my validator complains and won't allow matching. Any way around this? Is this expected?

Q2. If I skip apos and only focus on the ": If I use by as &quot; i get " in the output. If I use by as &amp;quot; I get &amp;quot;in the output and not only &quot;

In my validator I can see that the variable replaceQuotis correct, i.e. Let's fix these &quot;errors&quot; but the xml output is not "correct".

If I use <att name="name"><xsl:value-of select="$replaceQuot" disable-output-escaping="yes"/></att> then everything is as is. (I get expected, wanted text), But which makes this solution impossible is that the I also could have < & and > in this field, which I need to have escaped (I need valid xml).

Is there any way to mitigate? character-maps which I guess could solve this is only available from 2.0 as I understand it.

My take on this is that this is impossible to manage with xslt 1.0, and I need to either push consuming system to fix their import functionality or solve it in a post xml-step by another tool. Am I wrong?


Solution

  • Fixing the target system would of course be the preferable solution.

    Still, assuming that your processor supports disable-output-escaping, you should be able to do something like:

    XSLT 1.0

    <xsl:stylesheet version="1.0" 
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
    <xsl:strip-space elements="*"/>
    
    <!-- identity transform -->
    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>
    
    <xsl:template match="att[@name='name']/text()">
        <xsl:variable name="escape-apos">
            <xsl:call-template name="replace">
                <xsl:with-param name="text" select="."/>
                <xsl:with-param name="searchString">'</xsl:with-param>
                <xsl:with-param name="replaceString">&amp;apos;</xsl:with-param>
            </xsl:call-template>
        </xsl:variable>
        <xsl:variable name="escape-quotes">
            <xsl:call-template name="replace">
                <xsl:with-param name="text" select="$escape-apos"/>
                <xsl:with-param name="searchString">"</xsl:with-param>
                <xsl:with-param name="replaceString">&amp;quot;</xsl:with-param>
            </xsl:call-template>
        </xsl:variable>
        <xsl:value-of select="$escape-quotes" disable-output-escaping="yes"/>
    </xsl:template>
        
    <xsl:template name="replace">
        <xsl:param name="text"/>
        <xsl:param name="searchString"/>
        <xsl:param name="replaceString"/>
        <xsl:choose>
            <xsl:when test="contains($text,$searchString)">
                <xsl:value-of select="substring-before($text,$searchString)"/>
                <xsl:value-of select="$replaceString"/>
                <xsl:call-template name="replace">
                    <xsl:with-param name="text" select="substring-after($text,$searchString)"/>
                    <xsl:with-param name="searchString" select="$searchString"/>
                    <xsl:with-param name="replaceString" select="$replaceString"/>
                </xsl:call-template>
            </xsl:when>
            <xsl:otherwise>
                <xsl:value-of select="$text"/>
            </xsl:otherwise>
        </xsl:choose>
    </xsl:template>
    
    </xsl:stylesheet>
    

    However, this may seriously backfire if the text contains characters that should not be unescaped - for example, an actual ampersand.


    Added:

    Here's a sketch of an approach that could work with text that contains characters like &amp; or &lt; that need to remain escaped:

    <xsl:stylesheet version="1.0" 
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
    <xsl:strip-space elements="*"/>
    
    <!-- identity transform -->
    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>
    
    <xsl:template match="att[@name ='name']/text()">
        <xsl:call-template name="process">
            <xsl:with-param name="text" select="."/>
        </xsl:call-template>
    </xsl:template>
        
    <xsl:template name="process">
        <xsl:param name="text"/>
        <xsl:variable name="apos">'</xsl:variable>
        <xsl:variable name="quot">"</xsl:variable>
        <xsl:choose>
            <xsl:when test="contains($text, $apos) or contains($text, $quot)">
                <xsl:variable name="bef-apos" select="substring-before($text, $apos)"/>
                <xsl:variable name="bef-quot" select="substring-before($text, $quot)"/>
                <xsl:choose>
                    <xsl:when test="$bef-apos and (not($bef-quot) or string-length($bef-apos) &lt; string-length($bef-quot))">
                        <xsl:value-of select="$bef-apos"/>
                        <xsl:text disable-output-escaping="yes">&amp;apos;</xsl:text>
                        <xsl:call-template name="process">
                            <xsl:with-param name="text" select="substring-after($text, $apos)"/>
                        </xsl:call-template>
                    </xsl:when>
                    <xsl:otherwise>
                        <xsl:value-of select="$bef-quot"/>
                        <xsl:text disable-output-escaping="yes">&amp;quot;</xsl:text>
                        <xsl:call-template name="process">
                            <xsl:with-param name="text" select="substring-after($text, $quot)"/>
                        </xsl:call-template>
                    </xsl:otherwise>
                </xsl:choose>
            </xsl:when>
            <xsl:otherwise>
                <xsl:value-of select="$text"/>
            </xsl:otherwise>
        </xsl:choose>
    </xsl:template>
    
    </xsl:stylesheet>
    

    Caveat: not tested very thoroughly. Could probably be made more elegant with more work.