Search code examples
javascripthtmlxmlxsltinnerhtml

innerHTML unencodes < in attributes


I have an HTML document that might have &lt; and &gt; in some of the attributes. I am trying to extract this and run it through an XSLT, but the XSLT engine errors telling me that < is not valid inside of an attribute.

I did some digging, and found that it is properly escaped in the source document, but when this is loaded into the DOM via innerHTML, the DOM is unencoding the attributes. Strangely, it does this for &lt; and &gt;, but not some others like &amp;.

Here is a simple example:

var div = document.createElement('DIV');
div.innerHTML = '<div asdf="&lt;50" fdsa="&amp;50"></div>';
console.log(div.innerHTML)

I'm assuming that the DOM implementation decided that HTML attributes can be less strict than XML attributes, and that this is "working as intended". My question is, can I work around this without writing some horrible regex replacement?


Solution

  • What ended up working best for me was to double-escape these using an XSLT on the incoming document (and reverse this on the outgoing doc).

    So &lt; in an attribute becomes &amp;lt;. Thanks to @Abel for the suggestion.

    Here is the XSLT I added, in case others find it helpful:

    First is a template for doing string replacements in XSLT 1.0. If you can use XSLT 2.0, you can use the built in replace instead.

    <xsl:template name="string-replace-all">
        <xsl:param name="text"/>
        <xsl:param name="replace"/>
        <xsl:param name="by"/>
        <xsl:choose>
            <xsl:when test="contains($text, $replace)">
                <xsl:value-of select="substring-before($text,$replace)"/>
                <xsl:value-of select="$by"/>
                <xsl:call-template name="string-replace-all">
                    <xsl:with-param name="text" select="substring-after($text,$replace)"/>
                    <xsl:with-param name="replace" select="$replace"/>
                    <xsl:with-param name="by" select="$by"/>
                </xsl:call-template>
            </xsl:when>
            <xsl:otherwise>
                <xsl:value-of select="$text"/>
            </xsl:otherwise>
        </xsl:choose>
    </xsl:template>
    

    Next are the template that does the specific replacements that I need:

    <!-- xml -> html -->
    <xsl:template name="replace-html-codes">
        <xsl:param name="text"/>
        <xsl:variable name="lt">
            <xsl:call-template name="string-replace-all">
                <xsl:with-param name="text" select="$text"/>
                <xsl:with-param name="replace" select="'&lt;'"/>
                <xsl:with-param name="by" select="'&amp;lt;'"/>
            </xsl:call-template>
        </xsl:variable>
        <xsl:variable name="gt">
            <xsl:call-template name="string-replace-all">
                <xsl:with-param name="text" select="$lt"/>
                <xsl:with-param name="replace" select="'&gt;'"/>
                <xsl:with-param name="by" select="'&amp;gt;'"/>
            </xsl:call-template>
        </xsl:variable>
        <xsl:value-of select="$gt"/>
    </xsl:template>
    
    <!-- html -> xml -->
    <xsl:template name="restore-html-codes">
        <xsl:param name="text"/>
        <xsl:variable name="lt">
            <xsl:call-template name="string-replace-all">
                <xsl:with-param name="text" select="$text"/>
                <xsl:with-param name="replace" select="'&amp;lt;'"/>
                <xsl:with-param name="by" select="'&lt;'"/>
            </xsl:call-template>
        </xsl:variable>
        <xsl:variable name="gt">
            <xsl:call-template name="string-replace-all">
                <xsl:with-param name="text" select="$lt"/>
                <xsl:with-param name="replace" select="'&amp;gt;'"/>
                <xsl:with-param name="by" select="'&gt;'"/>
            </xsl:call-template>
        </xsl:variable>
        <xsl:value-of select="$gt"/>
    </xsl:template>
    

    The XSLT is mostly a pass-through. I just call the appropriate template when copying attributes:

    <xsl:template match="@*">
        <xsl:attribute name="data-{local-name()}">
            <xsl:call-template name="replace-html-codes">
                <xsl:with-param name="text" select="."/>
            </xsl:call-template>
        </xsl:attribute>
    </xsl:template>
    
    <!-- copy all nodes -->
    <xsl:template match="node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>