Search code examples
htmlxmlxhtmlcdata

Why is the CDATA section in my HTML not rendering?


I am writing a report about XML injection attacks in HTML. Thus I am going to have (mangled) HTML content as the content of my HTML. As such I am trying to wrap my HTML content in CDATA blocks but it does seem to be rendering properly.

I have the (validated by W3C) document:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
    <head>
        <title>report</title>
    </head>
    <body>
        <div><![CDATA[AuthType=<META HTTP-EQUIV="Set-Cookie" Content="USERID=&lt;SCRIPT&gt;alert('XSS')&lt;/SCRIPT&gt;">]]></div>
    </body>
</html>

From my understanding of the Wikipedia article this means that the content should be "marked for the parser to interpret as only character data, not markup". So the output should be

AuthType=<META HTTP-EQUIV="Set-Cookie" Content="USERID=&lt;SCRIPT&gt;alert('XSS')&lt;/SCRIPT&gt;">

However, in both Chrome 21.0.1180.60 m and Firefox 14.0.1 all that renders is

]]>

What is going on? Shouldn't everything from the <![CDATA[ to the first ]]> appear on screen as if every character had been escaped?


Solution

  • CDATA sections are recognized by browsers only in XML parsing mode. In legacy HTML mode, strange things happen, as you have seen.

    So you would need to configure the server to send the document with an XHTML Content-Type. This in turn would prevent old versions of IE (up to IE 8) from not rendering the document at all.

    The practical ways of displaying HTML tags as content of an HTML document are: a) Present each <as &lt; and each & as &amp;. Works in legacy HTML ande in XHTML. b) Wrap the data in an xmp element. Works in legacy HTML (only - so no XML Content-Type, but just declaring an XHTML doctype doesn't matter, it gets ignored anyway). Example:

    <xmp><![CDATA[AuthType=<META HTTP-EQUIV="Set-Cookie" Content="USERID=&lt;SCRIPT&gt;alert('XSS')&lt;/SCRIPT&gt;">]]></xmp>
    

    The xmp markup also implies a monospace font and pre-like rendering where whitespace is honored. But these can be overridden with CSS. The xmp element was dropped from HTML specs long ago but is supported by browsers quite well.