loge.hixie.ch

Hixie's Natural Log

2003-04-28 05:35 UTC This is why I hand-author

For reasons that I no longer try to fathom, I have to proof-read Narley's essays.

She writes them in Word. I don't accept Word documents, since they are in a proprietary format, yada yada, so she exports them to "HTML" before e-mailing them to me. Now I'm sure you're all very familiar with what Word calls HTML, but in case you're not, here is the header:

<html xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:w="urn:schemas-microsoft-com:office:word"
xmlns="http://www.w3.org/TR/REC-html40">

You might think it odd that Microsoft would use the wrong namespace for the HTML content, but you'd be wrong to think that that is the worst problem with the above markup. No, the worst problem is that it isn't XML, it's HTML. You can tell this from the fact that the rest of the markup isn't even remotely well-formed. (The missing DOCTYPE is something we'll gloss over, since frankly I'd rather this junk just be treated as quirks mode without any difficult questions.)

Let's have a look at one other bit of the markup:

...</p>
<p class=MsoNormal><![if !supportEmptyParas]>&nbsp;<![endif]><o:p></o:p></p>
<p class="MsoNormal">...

Say what? Let's ignore the utter invalidity of this line for one minute and focus on what it is trying to do... Increase the margin of the surrounding paragraphs. I think I can see the reasoning behind this (it almost certainly involved someone in a meeting one day saying "roundtripping is more important than cleanliness") but still, it should have been replaced by simply placing class "thematic-break" on the following element and then adding the following style rule, surely:

.thematic-break { margin-top: 2em; }

Now, fast forward a bit to when I try and edit this document to add my comments. Normally, I would switch to my Emacs window, and edit away. However, not even my regular expression search-and-replace skills are going to give me enough courage to try that, and I don't really trust HTML Tidy enough to remove the garbage from this file without removing things like extraneous spaces (part of proof-reading these documents involves pointing out when there are extra spaces that should be removed).

So instead, I go ahead and eat my own dogfood: I fire up Mozilla Composer.

My first problem is the font size is too small. So I fish around the UI, looking for an "increase font size" option. I eventually find one, and eventually figure out that to make it work, you have to have selected all the text first.

Net result (brace yourself):

...</big></big></big></p>
<big><big><big> </big></big></big>
<p class="MsoNormal"><!--[if !supportEmptyParas]--><!--[endif]--><big><big><big> <o:p></o:p></big></big></big></p>
<big><big><big> </big></big></big>
<p class="MsoNormal"><big><big><big>...

What on Earth! Now, added to the nonsense from Word, we have extra nonsense from Composer! If I say "increase the font size", I expect exactly one line to be added to the document, and that would be a style rule:

body { font-size: x-large; }

...with possibly some on-the-fly changes to the rest of the stylesheet to make sure it is expressed in ems and not absolute units. Sprinkling garbage all over the document markup really isn't acceptable.

Maybe if Web editors were forced to support alternate stylesheets this kind of nonsense would go away. After all, you can't support alternate stylesheets if you use style attributes and presentational markup.

Ahem. I think I'll just not look at the source view and pretend I'm using a WYSIWYG editor, I'm less likely to lose what remains of my sanity that way...