Hixie's Natural Log

2002-11-18 04:59 UTC Markup Challenge: diveintomark.org

The distinguished Mark Pilgrim was the first to come forward as possibly having a perfect Web log, although he admitted to having a few known issues.

While you read this site review, bear in mind that it is purposefully intensely pedantic. As my own site demonstrates, I'm coming from the view that complying to the spirit of the specs to be a lot more important than what a page may look like in famously broken browsers like Windows IE 6 and earlier.

Enough of the disclaimers, though. On with the review:


The whole document validates as XHTML 1.0 Strict, which is a cool.


Of course, I have to mention the fact that Mark is sending his XHTML page as text/html. Why not send it as text/html to IE, and application/xhtml+xml to everything else? This is roughly what Xiven does, for instance.

More MIME types

The main page is not the only page sent with the wrong MIME type, unfortunately. The FOAF file is sent as text/plain, while it should be some XML variant. This is especially noticable because the <link> element pointing to that file claims its MIME type is application/rdf+xml.

Similarly, the <link> element pointing to the RSS feed says its MIME type is application/rss+xml but it is actually the controversial text/xml. (Which causes another problem — what is the character set of the RSS feed? Per RFC3023 (section 3.1, paragraph 3) it's US-ASCII but per the XML declaration in the file, it's UTF-8. Thankfully in this case it doesn't really matter since all the codepoints in the file are common to both encodings.)

While we're on the subject, I'll just quickly add that the MIME type of the JS file (text/x-javascript) doesn't match the MIME type used in its <script> element (text/javascript) either.

<div id="logo">

This element appears to be there purely for stylistic reasons — it doesn't add anything to the structore of the document. As such, it should be removed.

Actually, <div>s are a pain. At the moment (i.e. in the pre-XHTML2 world), they are basically serving two roles: section delimiters (the <section> element in XHTML2) and presentational hooks for CSS (the <div> and <span> elements in XHTML2). Section delimiting is fine, but adding hooks for presentation into a supposedly semanticly marked-up document is very dubious. Unfortunately, CSS2 has very limited abilities for adding stylistic hooks to content (the :before, :after, :first-letter and :first-line pseudo-elements are about it) and the technically correct solution (using XSLT to add the hooks on the client side) is a pain.

My rule for whether a <div> is semantic (delimits a section) or presentational (only there to be used from CSS) is pretty simple. Does the block start with a header and then have content? Or, if not, would the block still make sense if you added a header to it? If the answer to either question is "Yes" then the <div> is legitimate, otherwise you should look for ways around it.

In this case, the <div> is definitely presentational, since all it contains is the page header (which is correctly marked up using an <h1>).

<span id="logoleft">

Well, in theory, <span>s with just a class or id are as bad as <div>s. However, I don't see any way around it, and indeed I use <span>s myself for exactly the same reason, so nevermind!

<span class="divider">&nbsp;</span>

Mark assures me that this is to get around a bug in Bobby, a tool used to detect accessibility problems on a site. This is silly! You should not make your page less compliant to the accessibility guidelines just in order to appease a buggy tool.

In fact, several of the uses of <span class="divider">&nbsp;</span> on Mark's site actually make the page harder to read using Lynx and other non-CSS browsers. I would recommend removing them all and complaining to the Bobby team.

<a class="skip" href="#startnavigation">Jump to navigation</a>

This needs to be marked up as a paragraph. If it wasn't for the <div>, which as mentioned above really should be removed, this would be invalid. Other than that, though, this is a great aid to accessibility. For instance when browsing this site with Linx it makes finding your way around the site a lot easier.

<div id="wrapper">, <div id="main">

These are redundant. A case could be argued for keeping one of them (it's the main content "section") but there is no doubt that at least one of them should be removed, since they exactly shadow each other. The name of the outer one's id is a giveaway too.

Redundant titles

This is a very minor nit, but I noticed that the permalinks have titles set on both the link, and the link label (an image). Since the image itself is not the permalink, I would suggest removing its title attribute.

The alternate text is well chosen, however, conveying exactly the same as the image. Indeed, I'd say the image is harder to understand than its alternate text! (I had to examine the square box to get its tooltip before I realised what it was for.)

<p class="firstparagraph">

That class attribute doesn't add anything, especially now that CSS has a :first-of-type selector.

<cite>dive into mark</cite>

Fine use of an often misunderstood element.


Good use of U+2019, the preferred character to use for apostrophe.


Empty paragraphs are disliked by the HTML specification. This is almost certainly caused by an over-zealous CMS, in which case it is a good example of why CMS systems have to be very carefully designed, and are not simply an alternative to writing accessible markup!

<p class="categories">&nbsp;

This non-breaking space is extraneous, and doesn't add anything valuable to the content (what does a word consisting of just a space on its own mean?). It's not entirely clear to me why the space is needed here, so I presume it is to work around some obscure browser bug.

Note that in this case, the single link is in a paragraph of its own, as I suggested the "Jump to navigation" link should be. This is good.

<span class="divider">[twisty.com] </span>

Normally I wouldn't even mention something like this, but I'm pretty sure it's not what was intended, and I wasn't really sure what was. The problem here, basically, is that I don't think the class is correct. How is the domain a divider? One could also argue that the content of the span is redundant, since it's information that is already stored in the link.

There are some other cases of strange use of the "divider" class in the menu section.

<div class="center"><div class="hr" title="Lorem ipsum is a harsh mistress"><hr /></div></div>

Wow! This is probably the worst line of the entire page. First, class="center" is presentational markup in disguise. The class should be made semantic ("divider" might actually be correct in this case). Second, the inner <div> is redundant (like the wrapper/main <div>s above). Third, the class of the inner <div> is redundant, since it is just a repeat of its contents. Fourth, the <hr>: Since this page is marked up using <div>s as section markers, it makes little sense to also use <hr>s.

I'd recommend removing the entire line, and using <div>s to mark up the days instead.

<a name="startnavigation" id="startnavigation"></a>

This appears to be redundant with the previous non-blank line, which also sets an id. If it's important to stick with <a name=""> markup, then it should preferably be wrapped around the header on the next non-blank line (think about what the element, as written, means: the start of the navigation is an empty string).

Overall, this is a very well designed site, with most of the problems appearing to be conscious decisions to work around bugs in software, rather than mistakes. The stylesheets are very well written, with pixels only used in the very few legitimate cases, ems and percentages being used elsewhere, colours and backgrounds specified together, and so forth. The markup is semantically rich, presentational markup is avoided except in a few cases, and many accessibility features are well used. This site will probably not be the most valid, semantically rich, no-presentational-markup, strictly compliant Web log of this challenge, but it is definitely a top contender.

The next site I'll be examining is Aaron Swartz's Web log.

Pingbacks: 1 2 3 4