I have an HTML document that might have <
and >
in some of the attributes. I am trying to extract this and run it through an XSLT, but the XSLT engine errors telling me that <
is not valid inside of an attribute.
I did some digging, and found that it is properly escaped in the source document, but when this is loaded into the DOM via innerHTML
, the DOM is unencoding the attributes. Strangely, it does this for <
and >
, but not some others like &
.
Here is a simple example:
var div = document.createElement('DIV');
div.innerHTML = '<div asdf="<50" fdsa="&50"></div>';
console.log(div.innerHTML)
I'm assuming that the DOM implementation decided that HTML attributes can be less strict than XML attributes, and that this is "working as intended". My question is, can I work around this without writing some horrible regex replacement?
What ended up working best for me was to double-escape these using an XSLT on the incoming document (and reverse this on the outgoing doc).
So <
in an attribute becomes &lt;
. Thanks to @Abel for the suggestion.
Here is the XSLT I added, in case others find it helpful:
First is a template for doing string replacements in XSLT 1.0. If you can use XSLT 2.0, you can use the built in replace
instead.
<xsl:template name="string-replace-all">
<xsl:param name="text"/>
<xsl:param name="replace"/>
<xsl:param name="by"/>
<xsl:choose>
<xsl:when test="contains($text, $replace)">
<xsl:value-of select="substring-before($text,$replace)"/>
<xsl:value-of select="$by"/>
<xsl:call-template name="string-replace-all">
<xsl:with-param name="text" select="substring-after($text,$replace)"/>
<xsl:with-param name="replace" select="$replace"/>
<xsl:with-param name="by" select="$by"/>
</xsl:call-template>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="$text"/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
Next are the template that does the specific replacements that I need:
<!-- xml -> html -->
<xsl:template name="replace-html-codes">
<xsl:param name="text"/>
<xsl:variable name="lt">
<xsl:call-template name="string-replace-all">
<xsl:with-param name="text" select="$text"/>
<xsl:with-param name="replace" select="'<'"/>
<xsl:with-param name="by" select="'&lt;'"/>
</xsl:call-template>
</xsl:variable>
<xsl:variable name="gt">
<xsl:call-template name="string-replace-all">
<xsl:with-param name="text" select="$lt"/>
<xsl:with-param name="replace" select="'>'"/>
<xsl:with-param name="by" select="'&gt;'"/>
</xsl:call-template>
</xsl:variable>
<xsl:value-of select="$gt"/>
</xsl:template>
<!-- html -> xml -->
<xsl:template name="restore-html-codes">
<xsl:param name="text"/>
<xsl:variable name="lt">
<xsl:call-template name="string-replace-all">
<xsl:with-param name="text" select="$text"/>
<xsl:with-param name="replace" select="'&lt;'"/>
<xsl:with-param name="by" select="'<'"/>
</xsl:call-template>
</xsl:variable>
<xsl:variable name="gt">
<xsl:call-template name="string-replace-all">
<xsl:with-param name="text" select="$lt"/>
<xsl:with-param name="replace" select="'&gt;'"/>
<xsl:with-param name="by" select="'>'"/>
</xsl:call-template>
</xsl:variable>
<xsl:value-of select="$gt"/>
</xsl:template>
The XSLT is mostly a pass-through. I just call the appropriate template when copying attributes:
<xsl:template match="@*">
<xsl:attribute name="data-{local-name()}">
<xsl:call-template name="replace-html-codes">
<xsl:with-param name="text" select="."/>
</xsl:call-template>
</xsl:attribute>
</xsl:template>
<!-- copy all nodes -->
<xsl:template match="node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>