Hixie's Natural Log

2002-08-18 23:23 UTC Valid garbage in, valid garbage out: Answers

The correct answers to the quiz I posted just over 2 days ago (find the four markup errors in a document which the automated W3C validator validates) are, in document order:

  1. The language code english is not a valid 2 letter language code per RFC 1766. A more correct language code would be en, en-GB, or en-GB-hixie.

    This is the error that I think advanced HTML-specific validators should be able to catch. (As opposed to SGML and XML validators, which is what the W3C validator is.) None of the other errors could be caught by an automated tool until we make huge leaps in AI.

  2. The <title> element doesn't contain a title. The contents of the title element would have been more appropriate as <meta name="author" content="Ian Hickson"> or some such (maybe using Dublin Core semantics). A correct title would have been "Current weather in Berlin, Germany" (same as the H1).

  3. The <cite> element in the H1 is grossly inappropriate. The <em> element would be better, if the intent was to emphasise the name of the city or the site of the weather. If the <cite> element was to be used at all, it should have been to give the source of the weather information (namely, CNN Weather).

  4. The alternate text for the icon ("low wind icon") is not alternative text, it's descriptive text. The text given would have been better in a title attribute. Correct alternative text would be something along the lines of "The wind is low.", which would have conveyed exactly what the icon was supposed to convey. Note that simply "low wind" would be suboptimal alternate text (although certainly more correct than "low wind icon"), as it would not have blended in well with the test of the paragraph.

I received a few incorrect answers, and to satisfy everyone's curiosity, I'll explain why they were wrong:

"C" is an ABBR and should be marked up as such.

Not all abbreviations need be marked up, in fact I would recommend only marking those that are likely to be unfamiliar to readers, otherwise UAs that highlight or expand abbreviations and acronyms will make your page annoying to read.

Missing HEAD and BODY elements or HTML and P end tags.

Actually, all of these are optional in HTML. (If they weren't, this would be the kind of error a validator would have caught, anyway.)

If you look in the HTML DTD or in the element index, you'll see that HTML, HEAD and BODY are marked as having optional start and end tags — that's what O means. For example:

<!ELEMENT BODY O O (%block;|SCRIPT)+ +(INS|DEL) -- document body -->

Similarly, the P element has optional end tags (the first dash means the start tag is not optional):

<!ELEMENT P - O (%inline;)*            -- paragraph -->

SGML processors (HTML is an SGML language like XHTML is an XML language) are able to unambiguously determine where implied start and end tags would appear. Note that this is a rather confusing feature of SGML (although by far not the most confusing one) and so XML doesn't have it... don't go omitting start and end tags in XHTML documents, they are not optional there!

The IMG shouldn't be in a P.

Why not? If you substitute the image for its equivalent alternate text, you'll see that in fact it makes perfect sense for it to be there.

The DTD doesn't include a URI.

In SGML, the URI (actually, the "system identifier") is optional. The bit before it (the "public identifer") is expected to be mapped to the URI by the application through the use of a "catalog".

Again, XML changed this, so in XHTML, you have to give both. Note that this kind of error, if it had been an error, would have been caught by the validator, anyway.

The CITE element cannot be in an H1 element.

Actually, it can: CITE is part of %phrase; which is part of %inline; and the content model for H1 is %inline;, as per the DTD:

<!ENTITY % phrase "EM | STRONG | DFN | CODE |
                   SAMP | KBD | VAR | CITE | ABBR | ACRONYM" >
<!ENTITY % inline "#PCDATA | %fontstyle; | %phrase; | %special; | %formctrl;">
<!ENTITY % heading "H1|H2|H3|H4|H5|H6">
<!ELEMENT (%heading;)  - - (%inline;)* -- heading -->

Again, if this was not allowed, the validator would have mentioned it, since that would be a simple thing that the DTD could explicitly call out.

The value of the alt attribute is incorrect because it doesn't describe it's contents, "Cloud and wind-blown tree" would be better.

"Cloud and wind-blown tree" would be reasonable title text, and a good start on longdesc text, but it would be terrible alternate text.

What information is the icon conveying? If you were reading out the document, would you say:

Title: Current weather in Berlin, Germany.

There are thunderstorms in Berlin at the moment. The air is very humid. The temperature is a warm 24°C. Cloud and wind-blown tree.

Or would you read:

Title: Current weather in Berlin, Germany.

There are thunderstorms in Berlin at the moment. The air is very humid. The temperature is a warm 24°C. The wind is low.

I'd hope you'd read it like the second. And that tells you immediately what the alternate text should be.

The BR tag would be better than the P tag in this case.

The BR tag should almost never be used. (Indeed, it is deprecated in XHTML2.) In this case, the P element is correct, since it marks a paragraph.

Below are the scores for those of you who replied by e-mail (sorry, but I'm not logging IRC at the moment, so those that I forgot about aren't on this list — mail me to remind me of your scores, if you care). If people submitted multiple attempts, I've used the best attempt, and mentioned what the original attempt scored. Within each category, the names are listed roughly in the order I received the answers. Congratulations to those of you who got 4 out of 4!

4 out of 4
  1. David Baron
  2. Chris Hubick
  3. Simon Willson
  4. Jonas Jørgensen
  5. Kris
  6. Matthew Wilson
  7. Markus Niederlechner
  8. Mauricio de Lemos Rodrigues Collares Neto (originally 2 out of 4)
  9. Charles Miller
  10. Matthew Thomas
  11. Christian Biesinger (originally 3 out of 4)
  12. Alexey Chernyak (originally 1 out of 4)
  13. Tom Pike (originally 1 out of 4)
  14. Michael Lefevre (originally 3 out of 4)
3 out of 4
  1. Bill Mason
  2. Josh Soref (originally 0 out of 4)
  3. Jason Fleshman
  4. Manuel
  5. Bradley Baetz
  6. Andy Allan
  7. Jeff Williams
2 out of 4
  1. Micah Sittig
  2. David Illsley
  3. Tim Watt (originally 1 out of 4)
1 out of 4
  1. Devon Y
  2. Philip Waring (originally 0 out of 4)

A few final notes, since this entry is going to be huge now whatever I do, and I haven't yet used the <ul> element today: