Browser splits up and nested <code> blocks that contain nested blocks. Why?

You can see my problem in this jsFiddle.

I tried usingcode tags to distinguish special content, but this quickly backfired on me (as you can see in the above link). When I use Firebug to look at the content, this:

<p>
    This is a sample paragraph with a code block:
    <code>
        <p> Some line of code </p>
        <p> Another line of code </p>
    </code>
</p>

has turned into this:

<p>
    This is a sample paragraph with a code block:
    <code> </code>
</p>
<p>
    <code> Some line of code </code>
</p>
<code>
    <p> Another line of code </p>
</code>

Now, this can be solved by changing <code> to <div class="code"> (as seen in this jsFiddle), but why did the browser do this in the first place, and why did it do it only to the first section in each paragraph?

Firefox, Opera, Chrome, Internet Explorer, Safari - all of them do this, but I'd really like to know why. Does it happen with code only, or will it do this with other tags? And why would browsers move tags around like that?

Solution

HTML places certain restrictions on which elements can be nested in which other elements. Sometimes browsers will happily construct a nonsensical DOM out of certain nesting scenarios, such as a <div> directly in a <ul>. Other times, they absolutely can't because of other written or unwritten parsing rules, such as  elements never containing any other block elements, not even other  elements (this is implied by the spec), so they have to work around it by changing the DOM to something that they can work with, resulting in the behavior you observe.

Because you cannot nest  elements within one another at all, what's happening here is that this element:

    <p> Some line of code </p>

is causing this other element to be terminated:

<p>
    This is a sample paragraph with a code block:
    <code>

Since there's an empty <code> tag in there, it's closed, and the containing  closed as well, because a subsequent  start tag will automatically close a preceding  start tag:

<p>
    This is a sample paragraph with a code block:
    <code> </code>
</p>

At this point a browser has to deal with the fact that the <code> and  tags are now effectively in the wrong order, but still nested. To compensate for the restructuring of the first "outer"  element, and the fact that there was going to be a <code> tag before the second "inner" , it inserts <code> tags into the second , turning its contents into code:

<p>
    <code> Some line of code </code>
</p>

Since browsers do seem to allow  within <code> for whatever reason (note that at this point the <code> is still not yet explicitly terminated with a </code>), the browser builds the rest of the DOM as follows, before continuing on its way:

<code>
    <p> Another line of code </p>
</code>

This is probably consistent across browsers for legacy and cross-browser compatibility reasons; some of these legacy parsing rules have been retconned into sections of the HTML5 spec as well. Unfortunately, I'm not a browser implementer so I can't list out all possible scenarios; on the other hand, it's unwise to rely on such details considering the markup you're writing is invalid in the first place.

And, finally, today's highly relevant xkcd (of course):

Browser splits up <p> and nested <code> blocks that contain nested <p> blocks. Why?