Hixie's Natural Log

2004-01-22 00:09 UTC Error handling and Web language design

I've been following the recent burst of posts about whether XML should have required that Web browsers stop processing upon hitting an error (as it does) or whether it should have let Web browsers recover from errors in vendor-specific ways (like HTML does) with some amusement, because asking the question in this yes/no form misses the point:

There is a third, better option.

Since a lot of people don't really understand the problem here, I'm going to give some background.

What's the point of a specification? It is to ensure interoperability, so that authors get the same results on every product that supports the technology.

Why would we ever have to worry about document errors? Murphy said it best:

If there are two or more ways to do something, and one of those ways can result in a catastrophe, then someone will do it.

Authors will write invalid documents. This is something that most Web developers, especially developers who understand the specs well enough to understand what makes a document invalid, do not really understand. Ask someone who does HTML/CSS quality assurance (QA) for a Web browser, or who has written code for a browser's layout engine. They'll go on at length about the insanities that they have seen, but the short version is that pretty much any random stream of characters has been written by someone somewhere and been labelled as HTML.

Why is this a problem? Because Tim Berners Lee, and later Dan Connolly, when they wrote the original specs for HTML and HTTP, did not specify what should happen with invalid documents. This wasn't a problem for the first five or so years of the Web.

At the start, there was no really dominant browser, so browsers presumably just implemented the specs and left the error handling to chance or convenience of the implementor. After a few years, though, when the Web started taking off, Netscape's browser soared to a dominant position. The result was that Web authors all pretty much wrote their documents using Netscape. Still no problem really though: Netscape's engineers didn't need to spend much time on error handling, so long as they didn't change it much between releases.

Then, around the mid-nineties, Microsoft entered the scene. In order to get users, they had to make sure that their browser rendered all the Web pages in the World Wide Web. Unfortunately, at this point, it became obvious that a large number of pages (almost all of them in fact) relied in some way on the way Netscape handled errors.

Why did pages depend on Netscape's error handling? Because Web developers changed their page until it looked right in Netscape, with absolutely no concern for whether the page was technically correct or not. I did this myself, back when I made my first few sites. I remember reading about HTML4 shortly after that become a W3C Recommendation and being shocked at my ignorance.

So, Microsoft reversed engineered Netscape's error handling. They did a ridiculously good job of it. The sheer scale of this feat is awe-inspiring. Internet Explorer reproduces aspects of Netscape's error handling which nobody at Netscape ever knew existed. Think about this for a minute.

Shortly after, Microsoft's browser became dominant and Netscape's browser was reduced to a minority market share. Other browsers entered the scene; Opera, Mozilla (the rewrite of the Netscape codebase), and Konqueror (later to be used as the base for Safari) come to mind, as they are still in active development. And in order to be usable, these browsers have to make sure they render their pages just like Internet Explorer, which means handling the errors in the same way.

Browser developers and layout engine QA engineers spend probably more than half their total work hours debugging invalid markup trying to work out what obscure aspect of the de facto error handling rules are being used to obtain the desired rendering. More than half!

It's easy to see why Web browser developers tend to be of the opinion that for future specifications, instead of having to reverse engineer the error handling behaviour of whatever browser happens to be the majority browser, errors should just cause the browser to abort processing.

Summary of the argument so far: Authors will write invalid content regardless. If the specification doesn't say what should happen, then once there is a dominant browser, its error handling (whether intentionally designed or just a side-effect of the implementation) will become the de facto standard. At this point, there is no going back, any new product that wants to interoperate has to support those rules.

So what is the better solution? Specifications should explicitly state what the error recovery rules are. They should state what the authors must not do, and then tell implementors what they must do when an author does it anyway.

This is what CSS1 did, to a large extent (although it still leaves much undefined, and I've been trying to make the rules for handling those errors clearer in CSS2.1 and CSS3). This is what my Web Forms 2.0 proposal does. Specifications should ensure that compliant implementations interoperate, whether the content is valid or not.

Note that all this is moot if you use XML 1.x, because XML specifies that well-formedness errors should be fatal. So if you don't want to have this behaviour in your language, don't use XML.

Pingbacks: 1

2004-01-18 23:00 UTC Void filling: Web Applications Language

About 11 months ago, I mentioned that the W3C had so far failed to address a need in the Web community: There is no language for Web applications. There is a language for hypertext documents (HTML), there is a language for vector graphic images (SVG), there is a vocabulary for embedding Math into both of those (MathML), and there are lots of support technologies (DOM, ECMAScript, CSS, SMIL)... But there is no language designed for writing applications, like Voidwars (a game) or Bugzilla (an issue tracking system) or for that matter the Mozillazine Forums or eBay auctions. What is needed is one (or maybe more) markup languages specifically designed to allow the semantics of sites like the above to be marked up, thus allowing for improvements in the accessibility of such sites.

It's been nearly a year since I first mentioned this, and the only group that seems to have done anything about this is Microsoft, with their worryingly comprehensive set of proprietary technologies (Avalon, XAML, WVG, etc) that appear designed to ensure vendor lock-in.

I intend to do something about this (hopefully within a W3C context, although that will depend on the politics of the situation). If you write Web-based applications, I would be interested in hearing about what your needs are. Please let me know: webapps@hixie.ch

Pingbacks: 1

2004-01-13 10:59 UTC Confusing spam

Maybe I've missed something. I don't know. Or maybe this is a joke. I just got a spam with the subject line This letter can only define Nigeria Scam, a.k.a 419, which starts off explaining what 419 spam is, saying that much of Nigeria's government is corrupt, and so forth. Fair enough, I thought (curious as to the goal of a spam that explained some of the story behind 419 fraud, even if this wasn't even close to an accurate explanation). Maybe this is ironic educational spam from some well-meaning, although confused, spam fighter.

Then I read paragraph 5:

The point I am making is nothing more than asking you to handle a pure deal of approximately USD$50,000,000.00, which will take approximately two weeks to conclude from here. Then the funds clear in any account of yours after 72hrs upon the remittance.

What? I'm confused. I thought you just said this was a scam?

Maybe they are trying to increase the bar, so that only very gullible people fall for these scams?

2004-01-10 11:39 UTC Mad people, Tim, and the Groom Lake facility

On my way to the office (which is the staging point for my mission to today's primary objective, central Olso) I passed an old lady who appeared to be muttering to herself, and it struck me: I can no longer tell the difference between insane people, and people on hands-free mobile phones. Literally. I have no idea if she was on the phone or not. And she definitely wasn't speaking to anyone physically near her.

Later today, Tim will be arriving for a few months. I haven't seen him since August. Hopefully he'll be encouraging me to get to work slightly, ah, earlier, than I have been.

Last night I finished reading a series of seven books by Robert Doherty which I started over the new year. I bought Area 51 around the 23rd of December, finished that day or the next, spent a few days itching to buy Area 51: The Reply, which I finally did around the 26th, along with Area 51: The Mission. I then spent about 2 days reading and about 8 days itching to buy Area 51: The Sphinx, which I finally did on Monday (the 5th), along with Area 51: The Grail, Area 51: Excalibur, and Area 51: The Truth. There appear to be no real analysis sites on the Web for this series, which surprises me. (Is The Lurker's Guide an anomaly, or what? I made my entry into science fiction fandom with Babylon 5, which, at the aforementioned site, has incredibly detailed analysis of every scene of every show, cross-referenced across episodes with detailed plot descriptions, directors comments, and so forth. Did other series not cause that kind of response? Even Stargate SG-1 doesn't really seem to have that kind of detailed analysis. Although, having tried writing one for some episodes myself, I can understand that, I guess. Good analysis is long, hard work.)

Turns out there is another book, Area 51: Nosferatu, now available, with yet another (Legend (Area 51)) coming in "March" (quotemarks because I've become rather familiar with projected publication dates what with my involvement with software development, specification editing, and book proof-reading). I also noticed, while buying those books, that one of my favourite authors, Peter F. Hamilton (author of the simply stunning Night's Dawn trilogy) has some more books on sale now.

However, no more books for me for at least a week. Reading does terrible things to my productivity. I have an addictive personality and very little self-control (which is why I don't drink) so when I start reading, I have to finish, even if it is past 5am. Not something I want to keep doing for extended periods of time, really.

I'd better be off now, my exfil window is closing.

2004-01-06 15:23 UTC Reminder of some notes from a Winter in Oslo

It seems I forgot about this, so I'll just re-mention it in the hopes that I'll remember it now: When cycling in the snow,

You have no brakes, and
You can only go in a straight line.