2006-01-25 06:12 UTC Tag Soup: Blocks-in-inlines
Imagine the following (invalid) markup:
<!DOCTYPE html>
<em
><p
>XY</p
></em
>
What should the DOM look like? The general consensus is that the DOM should look like this:
- DOCTYPE:
html
HTML
HEAD
BODY
EM
P
#text
: XY
That is, the p
element should be
completely inside (that is, a child of) the em
element.
No problem so far.
Now consider this markup:
<!DOCTYPE html>
<em
><p
>X</em
>Y</p
>
What should the DOM look like?
This is where things start getting hairy. I've covered a similar case before, so I'll just summarise the results:
- Windows Internet Explorer
The DOM is not a tree. The text node for the "Y" is a child of both the
p
element and thebody
element. Violates the DOM Core specifications.- Opera
The DOM is a simple tree, the same as for the first case, but the "Y" is not emphasised. Violates the CSS specifications.
- Mozilla and Safari
-
The DOM looks like this:
- DOCTYPE:
html
HTML
HEAD
BODY
EM
P
EM
#text
: X
#text
: Y
...which basically means that malformed invalid markup gets handled differently than well-formed invalid markup.
- DOCTYPE:
In the past, I would have stopped here, made some wry comment about the insanity that is the Web, and called it a day.
But I'm trying to spec this. Stopping is not an option.
What IE does is insane. What Opera does is also insane. Neither of those options is something that I can put in a specification with a straight face.
This leaves the Mozilla/Safari method.
It's weird, though. If you look at the two examples above, you'll notice that their respective markups start the same — both of them start with this markup:
<!DOCTYPE html>
<em
><p
>X
Yet the end result is quite different, with one of the elements
(the p
) having different
parents in the two cases. So when do the browsers decide what
to do? They can't be buffering content up and deciding what to do
later, since that would break incremental rendering. So what exactly
is going on?
Well, let's check. What do Mozilla and Safari do for that truncated piece of markup?
- Mozilla
-
- DOCTYPE:
html
HTML
HEAD
BODY
EM
P
EM
#text
: X
- DOCTYPE:
- Safari
-
HTML
BODY
EM
P
#text
: X
Hrm. They disagree. Mozilla is using the "malformed" version, and Safari is using the "well-formed" version. Why? How do they decide?
Let's look at Safari first, by running a script while the parser is running. First, the simple case:
<!DOCTYPE html>
<em
> <p
> XY <script
> var p = document.getElementsByTagName('p')[0]; p.title = p.parentNode.tagName; </script
> </p
> </em
>
Result:
HTML
BODY
EM
#text
:P
title
="EM
"#text
: XYSCRIPT
#text
: var p = document.getElementsByTagName('p')[0]; p.title = p.parentNode.tagName;
#text
:
#text
:
Exactly as we'd expect. The parentNode
of the p
element as shown in the DOM tree view is
the same as shown in the title
attribute value, namely, the em
element.
Now let's try the bad markup case:
<!DOCTYPE html>
<em
> <p
> X <script
> var p = document.getElementsByTagName('p')[0]; p.title = p.parentNode.tagName; </script
> </em
> Y </p
>
Result:
HTML
BODY
EM
#text
:
P
title
="EM
"EM
#text
: XSCRIPT
#text
: var p = document.getElementsByTagName('p')[0]; p.title = p.parentNode.tagName;
#text
:
#text
: Y
Wait, what?
When the embedded script ran, the parent of the p
was the em
, but when the parser had finished, the
DOM had changed, and the parent was no longer the em
node!
If we look a little closer:
<!DOCTYPE html>
<em
> <p
> X <script
> var p = document.getElementsByTagName('p')[0]; p.setAttribute('a', p.parentNode.tagName); </script
> </em
> Y <script
> var p = document.getElementsByTagName('p')[0]; p.setAttribute('b', p.parentNode.tagName); </script
> </p
>
...we find:
HTML
BODY
EM
#text
:
P
a
="EM
"b
="BODY
"EM
#text
: XSCRIPT
#text
: var p = document.getElementsByTagName('p')[0]; p.setAttribute('a', p.parentNode.tagName);
#text
:
#text
: YSCRIPT
#text
: var p = document.getElementsByTagName('p')[0]; p.setAttribute('b', p.parentNode.tagName);
#text
:
...which is to say, the parent changes half way through! (Compare
the a
and b
attributes.)
What actually happens is that Safari notices that something bad has
happened, and moves the element around in the DOM. After the
fact. (If you remove the p
element
from the DOM in that first script block, then Safari
crashes.)
How about Mozilla? Let's try the same trick. The result:
- DOCTYPE:
html
HTML
HEAD
BODY
EM
#text
:
P
a
="BODY
"b
="BODY
"#text
:EM
#text
: XSCRIPT
#text
: var p = document.getElementsByTagName('p')[0]; p.setAttribute('a', p.parentNode.tagName);
#text
:
#text
: YSCRIPT
#text
: var p = document.getElementsByTagName('p')[0]; p.setAttribute('b', p.parentNode.tagName);
#text
:
It doesn't reparent the node. So what does Mozilla do?
It turns out that Mozilla does a pre-parse of the source, and if a
part of it is well-formed, it creates a well-formed tree for it, but
if the markup isn't well-formed, or if there are any script
blocks, or, for that matter, if
the TCP/IP packet boundary happens to fall in the wrong place, or if
you write the document out in two document.write()
s
instead of one, then it'll make the more thorough nesting that handles
ill-formed content.
Who would have thought that you would find Heisenberg-like quantum effects in an HTML parser. I mean, I knew they were obscure, but this is just taking the biscuit.
The problem is I now have to determine which of these four options to make the other three browsers implement (that is, which do I put in the spec). What do you think is the most likely to be accepted by the others? As a reminder, the options are incestual elements that can be their own uncles, elements who have secret lives in the rendering engine, elements that change their mind about who their parents are half-way through their childhood, and quantum elements whose parents change depending on whether you observe their birth or not.
The key requirements are probably:
- Coherence: scripts that rely on DOM invariants (like the fact that the DOM is a tree) shouldn't go off into infinite loops.
- Transparency: we shouldn't have to describe a whole extra section that explains how the CSS rendering engine applies to HTML DOMs; CSS should just work on the real DOM as you would see it from script.
- Predictability: it shouldn't depend on, e.g., the protocol or network conditions — every browser should get the same DOM for the same original markup in all situations.
The least worse option is probably the Safari-style on-the-fly reparenting, I think, but I'm not sure. It's the only one that fits those requirements. Is there a fifth option I'm missing?