Hixie's Natural Log: Content negotiation in heterogenous XML environments

2002-11-08 14:53 UTC Content negotiation in heterogenous XML environments

I recently came across RFC 3023, specifically appendix A. It normatively updates RFC 2048. RFC 3023 is a well written RFC, clear and understandable.

The problem that this specification tries to solve is what MIME types to use in the presence of XML documents that contain mixtures of namespaces. For example, a server could serve one URI as five different documents: pure XHTML, XHTML+SVG with the root element being XHTML, SVG+XHTML with the root element being SVG, XHTML+MathML with the root element being XHTML, and some typical RDF with embedded XHTML. (Typical RDF documents have about 329 namespaces in them.)

In the past, I'd been of the opinion that text/xml should be the True XML MIME type, so there would just be one MIME type for all five. RFC 3023 points out that some applications might want to process documents based only on their overall type, especially for something like images, so it would label the XML documents as follows:

MIME types of XML documents
Document	MIME type
XHTML	`application/xhtml+xml`
XHTML+SVG	`application/xhtml+xml`
SVG+XHTML	`image/svg+xml`
XHTML+MathML	`application/xhtml+xml`
RDF	`application/rdf+xml`

Basically, the most important namespace is what decides the MIME type. This is often, but not always, the namespace of the root element; XSLT compound documents are an example where the most important namespace (XSLT's) is not the same as the root element's (probably nothing or some private namespace which the XSLT transformation sheet can turn into a well known namespace). The important thing to remember, though, is that in this model, the MIME type isn't really that important — the +xml suffix is the key. A UA like Mozilla, which is basically a generic XML+CSS UA with some special support for certain namespaces, would treat all MIME types ending with this suffix in the same way, only using the namespaces to dispatch the data to the right processor.

So you could label an XHTML document as application/rdf+xml and it really wouldn't make any difference to generic XML processors. (Of course, doing so would be incorrect, since XHTML is not RDF.)

The first problem with all this is that it doesn't really help with content negotiation. If you have a UA that supports XHTML and SVG, you could make it say so by claiming:

Accept: text/xml,application/xml,application/xhtml+xml,image/svg+xml

However, this wouldn't help the server know whether to return the XHTML version, the XHTML+SVG version, the SVG+XHTML version, or the XHTML+MathML version, because they are all labelled as recognised MIME types.

This problem could be solved by an approach such as that given in Simon St.Laurent's xmlns Media Feature Tag proposal, except Media Features (see RFC 2295 and RFC 2533) are not well implemented and are massively more complicated than necessary. (These two statements are probably related.)

The second problem with saying that there should be an arbitrary number of MIME types is that it would result in excessively large Accept headers. It also means that each time a new XML type is registered, every generic XML UA has to be updated to know about it.

So what we need is two-fold. First, UAs should be able to state that they support XML, any XML. Something along the lines of the following (but note that this is invalid):

Accept: text/xml,application/xml,*/*+xml

Secondly, we need a simple way for UAs to explicitly state the namespaces they support so that servers that support XML can know which documents to serve when there are multiple alternatives.

Pingbacks: 1