I need to escape "
and '
in an xml file to "
'
using XSLT 1.0 due to that the consuming system cannot handle these characters inside an xml element.
I've run into two issues that I haven't solved.
'
at all."
to stay escaped (while not ruining the whole string).I'm now believing this cannot be done. If I'm wrong, please let me know.
XML file:
<?xml version="1.0" encoding="UTF-8"?>
<foo>
<bar>
<att name="name">Let's fix these "errors".</att>
</bar>
</foo>
Wanted output would be something like this:
<?xml version="1.0" encoding="UTF-8"?>
<foo>
<bar>
<att name="name">Let's fix these "errors".</att>
</bar>
</foo>
XSL:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="att[@name ='name']">
<xsl:param name="content" select="text()" />
<xsl:variable name="replaceApos">
<xsl:call-template name="string-replace-all">
<xsl:with-param name="text" select="$content" />
<xsl:with-param name="replace" select="'''" />
<xsl:with-param name="by" select="'''" />
</xsl:call-template>
</xsl:variable>
<xsl:variable name="replaceQuot">
<xsl:call-template name="string-replace-all">
<xsl:with-param name="text" select="$replaceApos" />
<xsl:with-param name="replace" select="'"'" />
<xsl:with-param name="by" select="'&quot;'" />
</xsl:call-template>
</xsl:variable>
<att name="name"><xsl:value-of select="$replaceQuot"/></att>
</xsl:template>
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()" />
</xsl:copy>
</xsl:template>
<xsl:template name="string-replace-all">
<xsl:param name="text" />
<xsl:param name="replace" />
<xsl:param name="by" />
<xsl:choose>
<xsl:when test="$text = '' or $replace = ''or not($replace)" >
<!-- Prevent this routine from hanging -->
<xsl:value-of select="$text" />
</xsl:when>
<xsl:when test="contains($text, $replace)">
<xsl:value-of select="substring-before($text,$replace)" />
<xsl:value-of select="$by" />
<xsl:call-template name="string-replace-all">
<xsl:with-param name="text" select="substring-after($text,$replace)" />
<xsl:with-param name="replace" select="$replace" />
<xsl:with-param name="by" select="$by" />
</xsl:call-template>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="$text" />
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
(note: above XSL won't validate with ''
or '
)
Q1. Regardless if I use '
'
or '
my validator complains and won't allow matching. Any way around this? Is this expected?
Q2. If I skip apos and only focus on the "
: If I use by
as "
i get "
in the output.
If I use by
as &quot;
I get &quot;
in the output and not only "
In my validator I can see that the variable replaceQuot
is correct, i.e. Let's fix these "errors"
but the xml output is not "correct".
If I use <att name="name"><xsl:value-of select="$replaceQuot" disable-output-escaping="yes"/></att>
then everything is as is. (I get expected, wanted text), But which makes this solution impossible is that the I also could have <
&
and >
in this field, which I need to have escaped (I need valid xml).
Is there any way to mitigate? character-maps which I guess could solve this is only available from 2.0 as I understand it.
My take on this is that this is impossible to manage with xslt 1.0, and I need to either push consuming system to fix their import functionality or solve it in a post xml-step by another tool. Am I wrong?
Fixing the target system would of course be the preferable solution.
Still, assuming that your processor supports disable-output-escaping
, you should be able to do something like:
XSLT 1.0
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>
<!-- identity transform -->
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="att[@name='name']/text()">
<xsl:variable name="escape-apos">
<xsl:call-template name="replace">
<xsl:with-param name="text" select="."/>
<xsl:with-param name="searchString">'</xsl:with-param>
<xsl:with-param name="replaceString">&apos;</xsl:with-param>
</xsl:call-template>
</xsl:variable>
<xsl:variable name="escape-quotes">
<xsl:call-template name="replace">
<xsl:with-param name="text" select="$escape-apos"/>
<xsl:with-param name="searchString">"</xsl:with-param>
<xsl:with-param name="replaceString">&quot;</xsl:with-param>
</xsl:call-template>
</xsl:variable>
<xsl:value-of select="$escape-quotes" disable-output-escaping="yes"/>
</xsl:template>
<xsl:template name="replace">
<xsl:param name="text"/>
<xsl:param name="searchString"/>
<xsl:param name="replaceString"/>
<xsl:choose>
<xsl:when test="contains($text,$searchString)">
<xsl:value-of select="substring-before($text,$searchString)"/>
<xsl:value-of select="$replaceString"/>
<xsl:call-template name="replace">
<xsl:with-param name="text" select="substring-after($text,$searchString)"/>
<xsl:with-param name="searchString" select="$searchString"/>
<xsl:with-param name="replaceString" select="$replaceString"/>
</xsl:call-template>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="$text"/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
However, this may seriously backfire if the text contains characters that should not be unescaped - for example, an actual ampersand.
Here's a sketch of an approach that could work with text that contains characters like &
or <
that need to remain escaped:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>
<!-- identity transform -->
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="att[@name ='name']/text()">
<xsl:call-template name="process">
<xsl:with-param name="text" select="."/>
</xsl:call-template>
</xsl:template>
<xsl:template name="process">
<xsl:param name="text"/>
<xsl:variable name="apos">'</xsl:variable>
<xsl:variable name="quot">"</xsl:variable>
<xsl:choose>
<xsl:when test="contains($text, $apos) or contains($text, $quot)">
<xsl:variable name="bef-apos" select="substring-before($text, $apos)"/>
<xsl:variable name="bef-quot" select="substring-before($text, $quot)"/>
<xsl:choose>
<xsl:when test="$bef-apos and (not($bef-quot) or string-length($bef-apos) < string-length($bef-quot))">
<xsl:value-of select="$bef-apos"/>
<xsl:text disable-output-escaping="yes">&apos;</xsl:text>
<xsl:call-template name="process">
<xsl:with-param name="text" select="substring-after($text, $apos)"/>
</xsl:call-template>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="$bef-quot"/>
<xsl:text disable-output-escaping="yes">&quot;</xsl:text>
<xsl:call-template name="process">
<xsl:with-param name="text" select="substring-after($text, $quot)"/>
</xsl:call-template>
</xsl:otherwise>
</xsl:choose>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="$text"/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
Caveat: not tested very thoroughly. Could probably be made more elegant with more work.