loge.hixie.ch

Hixie's Natural Log

2006-08-07 11:27 UTC The sacrifice of pragmatism over theoretical purity

A while back I mentioned that the HTTP Content-Type header was effectively dead on the Web, with Web browsers being forced by market demand to ignore authoritative metadata regarding data formats, in favour of having Web browsers automatically determine the type of the content the user is viewing.

In particular, this is required for video on the Web — millions of videos are served as text/plain and a Web browser that displays garbage instead of showing the video is one that will get very few happy users — and for feeds (RSS, Atom, etc), which are more often than not served as text/html despite being XML. (As a side note, a little over two years ago Mark Pilgrim pointed out that XML on the Web was dead too, also because of feeds. Seems these popular feeds could be responsible for the destruction of the Web!) It's also needed for images — many images are sent using the completely wrong MIME type, e.g. GIFs as image/png, or PNGs as text/plain, or JPEGs as application/octet-stream, and browsers uniformly ignore the MIME type when it comes to <img> elements. Same with scripts — it doesn't matter what you label your JavaScript files as, at all; when it comes to <script> elements, browsers uniformly ignore the HTTP Content-Type header and rely exclusively on the attributes on the element.

However, it seems that these real world concerns are not a factor in the TAG's findings, since the day after I posted the aforementioned blog post, they published a document describing how browsers must always follow Content-Type headers, how specifications must never require browsers to ignore such headers, and how authors must all go and correct their mis-configured servers. (I only recently became aware of this document, I don't normally follow the W3C TAG.)

I'm curious as to how they're going to go about making people follow their recommendations. Until the servers are fixed, browsers can't fix their behaviour, because users would consider the spec-compliant behaviour to be a regression ("the previous version showed the video, the new version just shows garbage"). So that means that they presumably are going to get all the servers fixed.

But how?

Until then, HTML5 is going to continue the pragmatic route, and will be requiring UAs to ignore Content-Type in certain situations (mostly those I mentioned above). It sucks, but I'd rather have the spec be implementable in the face of real world content than have it be uniformly ignored.