Hixie's Natural Log

2002-11-24 00:19 UTC Markup Challenge: aaronsw.com

Next up in the Markup Challenge is Aaron Swartz. Aaron is the HWG's representative on the RDF core working group! I could never understand RDF, and have a great respect for those who can.

I must once again remind you that while you read this site review, you should bear in mind that it is purposefully intensely pedantic. I'm coming from the view that complying to the spirit of the specifications ("theoretical accessibility") is more important than what the site is rendered in, in practice ("practical accessibility", if you will). This is not a completely "ivory tower" position; as Web browsers improve, standards-compliant content becomes more and more accessible and usable in different contexts, while non-compliant content becomes less and less useful.

Also, Aaron, like Mark, volunteered for this and asked me to be as pedantic as possible. So...


The first problem is a minor one given the state of our domain name space, but ".com" was supposed to be for companies, and this site is not a commercial site. Of course, I'm guilty of domain name polution myself, so maybe I shouldn't be so eager to raise this issue!


A great start: the document validates to a Strict DTD.

XHTML sent as text/html

All my readers are painfully aware of my position on XHTML being sent as text/html (or at least, that's the impression I get from all the e-mails I receive with the disclaimer "Yes, I know my site is sent as text/html, I'm so sorry, please don't shoot me, please"). So they would probably expect me to complain loudly here about how Aaron is doing the wrong thing, etc.

Well, no such luck. Aaron and I are currently in the midst of a discussion regarding the pros and cons of XHTML-as-text/html, and so it is possible that I may have my mind changed (as I did about application/xhtml+xml vs text/xml). Therefore, while I wait to see how that unfolds, my opinions on XHTML-as-text/html are on hold.

Correct MIME types

The RSS feed and the stylesheets are correctly sent with the right MIME types. Incidentally, as several people have pointed out to me, application/rss+xml is technically not a valid MIME type. Neither is the almost ubiquitous text/javascript. But if the standards communites are going to drag their feet defining obvious MIME types, the world will continue without them, sadly. There is an obsolete application/rss+xml MIME type registration ID, does anyone know why it didn't get moved to the standards track category (like text/xml) or to the informational categary (like text/html)?

No defined character set

The HTML files contain no encoding information, which means that a number of conflicting specifications (notably HTTP, MIME, RFC2854, HTML 4.01, XHTML 1.0 and XML) enter the game to try to determine which character set should be used to decode them, with the only clear conclusion being "guess". Luckily, Aaron only uses codepoints that are a common subset of US-ASCII, UTF-8, and ISO-8859-1, so the confusion doesn't cause any ambiguities.

The CSS and RSS files similarly have no explicit encoding information, but this is less of a problem. Only encodings that are a superset of US-ASCII may be used with text/css so if the file only contains US-ASCII characters the point is moot, and RSS files have well defined rules for determining the character set (the application/rss+xml specification points to section 3.2 of RFC 3023 which points to section 4.3.3 of the XML specification).

Setting colours without backgrounds

I had my default background colour set to a dark red (#BB0000, it makes a nice background) so I didn't realise that there was a header on the pages — they are coloured #b00, the same colour! My user stylesheet looked like this:

:root { background: #BB0000; color: white; }
:link { color: yellow; background: transparent; }
:visited { color: orange; background: transparent; }

This stylesheet also made it very hard read to read the lines giving the dates of when the articles were posted.

While I applaud the idea of leaving most of the colours to user, you have to be careful to always give colours and backgrounds together, so as to avoid this kind of clash. (When setting a colour to transparent, make sure to then set all colours, so as to prevent further clashes.)

<div id="banner">

As with Mark's <div id="logo">, this element appears to be there purely for stylistic reasons and doesn't seem to add anything to the structore of the document. As such, it should be removed.

The contents of the element are in the form "title" "subtitle", which isn't very well handled by XHTML1, unfortunately. A better solution would be either to use both <h1> and <h2> in series, or, preferably in my opinion, to use a single <h1> with the important part emphasised with an <em> element or some such. This problem was recently discussed in www-html, hopefully the HTML WG will notice and give us a subheading element, or at least explicitly state how to mark up subheadings. This is a problem I often hit.

<div class="content"><div id="main">

At least one of those <div>s is redundant, if not both.

<h2 class="title">

That's a tautology: the class isn't adding anything.

alt="H4X0r ECONOMIST: make economy"

That alternate text is more like a title than alternate text. A much better alternate text would be "H4X0R ECONOMIST. lol GPL wh4Tev3r make economy. Alan Greenspan scowls. The Supreme Court finds that, owing to GPL compliance issues raised in the case Free Software Federation v. Greenspan, our nation must henceforth be known as the United Nations of GNUmerica. William Rhenquist has a serious expression. One week later: Richard Stallman is happy: YES", which conveys roughly the same as the comic. A longdesc to a description of the comic's typography and layout would be useful too.

<table class="invisible">

While I think a table is actually a not unreasonable element to use for the semantics here (two paragraphs being compared), the class is. "invisible" is quite clearly a presentational semantic. A better class would have been "comparative alegory", for example.

<div class="img"><img src="http://www.aaronsw.com/2002/gelernterNov7-1" alt="A file cabinet being thrown out the window" /><br />Jon Keegan, New York Times</div>

Here we see several problems back to back. First, why a <div>? This is a paragraph, not a section. Secondly, the poor alternate text. The image is purely decorative as far as I can tell: if I was reading this story to someone, the image would not convey any additional information. Appropriate alternate text is therefore probably "". Next, the <br> element. There is nothing inherent about that paragraph that semantically requires a line break: that the image and the text appear on different lines is purely presentational. Finally, the caption, which I presume is a citation, should be marked up using a <cite> element.


That class is redundant, given that the calendar has an ID.

Lists that don't use list markup

There's a list of entries in the "Archives" section of the menu sidebar, but it is delimited by <br> elements instead of using one of HTML's list elements. This same problem is repeated in several places, e.g. "Feeds I Read".

More presentational <br>

In the "What I'm Doing" section there are some very blatent cases of <br>s that have absolutely no semantic purpose. Most should be removed, or turned into paragraphs or lists.

alt="spread the dot" ... spread the dot

The image doesn't seem to actually be conveying anything, certainly not "spread the dot" since that text is immediately repeated after the dot. Maybe alt="" would be better?

<div class="footer"> <address>

According to Dan Connolly, the <address> element is a general footer element (unfortunately I can't find a reference to the e-mail thread in which we discussed this; maybe I am misremebering what he said), so that would make the <div> element here redundant. Unfortunately the HTML4 and XHTML2 specifications don't really back that up, so maybe a footer <div> is best. In any case, the copyright notice in the footer should be in a paragraph.

The main errors I would be concerned about are the inappropriate <br> and poor alternate texts, as they will be affecting accessibility today, but the other errors are not that minor either. I liked the stylesheets in general, as they tried to give the users the final word on most issues. The lengths are given in relative units, which is always good.

Overall, not quite as good an effort as Mark's, but still respectable. I look forward to seeing whether Aaron fixes all the errors as Mark did!

The next site I'll be examining is Mike Shaver's Web log. I don't know how many more of these I'm going to do, it depends largely on how long I can keep doing them without either getting bored or before I run out of new errors (not much point reporting the same errors over and over again). Thanks, by the way, to all the kind people who have been picking my own site to pieces by e-mail... I'm glad to say that don't think there are any outstanding issues.