Hixie's Natural Log: Whitepaper: Pingback vs Trackback

2002-09-28 00:05 UTC Whitepaper: Pingback vs Trackback

It seems pingback has caused quite a stir in the Web logging and syndication communities! The spec is barely a week old and already I'm seeing pingbacks on sites of people I've never heard of, so implementations are spreading, which is great. It also seems pingback has acted a little like a kick in the backside to the trackback folk, causing them to work on the transparency side of trackback, which is good to see too. (Ironically the one place which did not mention pingback at all is the trackback development Web log.)

There have been many questions asked and assertions made about pingback, and I'm going to try to answer them.

Referrers are enough

I think the best answer to this might be a typical automated referrer list. Look at the number of redundancies, of vague URIs (most referrers are in fact home pages, not permalinks, so the referrer list quickly becomes useless), and of pointless referrers (such as those from webmail systems). Referrers are very noisy (spiders include bogus referrers, people come to sites with only subtly different referrers such as a trailing slash or no trailing slash, etc), referrers often don't give you the permalink of the originating resource, and referrers contain many links from news aggregators, blogrolls, search engines and other pages which do not contain comments on the post. On the other hand, pingbacks are reliable because they are explicitly requested, which gives them a near perfect signal to noise ratio (they are only triggered by pages that actually link to the post with comments), and the pingback mechanism is easily extended to provide services such as pingback listing (when a site asks another for the list of pingbacks — see below for ideas on this).

Pingback, Trackback and Referrers are pointless

Well then don't use them. Personally I love finding that other people have commented about my post (every pingback, trackback and new referrer I receive automatically sends me an e-mail). I don't like having comments on my Web log because I believe if you want to comment then you should do so on your own Web log. If I'd wanted to host a discussion forum, I'd have installed discussion forum software, not Web logging software!

Trackback is simpler: In the trackBack model, the client basically does all of the work of auto-discovery and mapping a permalink to a ping URL; in the pingback model, the server does the work.

This is fundamentally incorrect. In the pingback model, the client does the work of finding the pingback server, and then invokes a simple XML-RPC call to that server with the two URIs. In the Trackback model, the client has to do all the work of finding the trackback server, which includes mapping the permalink to the trackback ID by parsing some RDF. The client then has to call the server using the constructed URI, which then (depending on the implementation) has to map this ID back to the permalink.

Calling pingback's transparency a benefit over trackback is misleading

Having now found out a lot more about trackback, I would agree. There are quite a few much more important benefits which I would call out instead:

Simpler autodiscovery: To detect a pingback server, you have some very simple rules to follow: You look for a particular HTTP header, and failing that, you follow an explicit algorithm to find the URI of the pingback server in the content. The TrackBack Technical Documentation document doesn't say exactly how one is to discover a trackback server, but hints that one should maybe search through the target for an RDF fragment, which one then has to parse (this is rather non-trivial given the flexibility of RDF).
Greater flexibility: You can pingback any document, even plain text files, images, and files which do not know that pingback exists, so long as the document can be served with a pingback HTTP header. Due to its use of RDF, trackback can only correctly be applied to XHTML documents sent as text/xml, application/xml, or (maybe at a pinch) application/xhtml+xml, and, if you are ready to violate the standards, to HTML or XHTML documents sent as text/html. It cannot be easily applied to images (you'd have to include a text chunk within the binary file to store the RDF, and the spec doesn't say how you would find it) and if applied to a plain text document users would be able to see the contents of the RDF.
Less redundancy: Trackback requires you to associate each post with both a permalink and a trackback ID, and uses one for sending trackbacks and one for receiving them. Pingback uses just permalinks, both for the source and the target.
No custom languages: Pingback uses only existing languages for which many high-level toolkits have been written: XML-RPC, HTTP, HTML, etc. Trackback uses all of these and more, and then invents its own additional language: for responses it uses a custom XML vocabulary, which, due to the requirement in the spec that implementations be compatible with future extensions, is non-trivial to interpret.
Easier deployment: To deploy pingback on an entire site requires nothing more than a single line added to a site configuration file. To deploy trackback requires that each permalink be associated with a numeric ID and that each document have an individually customised block of RDF added that uses this ID and duplicates the post's metadata.
Well defined: The trackback specification gives no rules for how to handle multiple or contradictory RDF fragments and invalid RDF fragments, it has no well defined error codes, it doesn't define the autodiscovery mechanism, etc. The pingback spec defines all of these things in detail.
Sensible semantics in autodiscovery: The trackback RDF actually implies that the trackback server has the title and author of the post, when in fact the opposite is the case (the post is the data and the trackback server the metadata, not the other way around — in fact there is nothing about the trackback RDF to even say that it is anything to do with trackback!). This is an unfortunate abuse of the Dublin Core vocabulary. Pingback was carefully designed to not contain such semantic errors.
Sensible semantics in reporting: To report a trackback, you have to use an HTTP GET request. This is a violation of the HTTP specification, which says that GET requests must be idempotent. Pingback uses a POST, which is defined as being the appropriate method to use to make requests and affect changes on remote servers.
More standards compliant: In addition to requiring violations of the Dublin Core and HTTP specifications, the trackback system requires embedding RDF within the document, which is invalid in HTML, invalid in XHTML-sent-as-text/html, and invalid in strict XHTML-sent-as-text/xml. (Note: I am not saying it is invalid per the W3C validator, I'm saying it is invalid per the specs.) Pingback was carefully designed to not require any violation of existing standards (it extends HTTP, HTML and XML-RPC in the ways recommended by those specifications).
Not site specific: I use the same pingback server, with exactly the same HTTP header, for www.hixie.ch, www.damowmow.com, and my half a dozen other sites. Trackback has to be configured individually for each server (mainly due to the whole ID problem).

Note to self: Read the pingback spec. Form opinion.

What's the verdict, Dave?

Pingback doesn't define how to get hold of the page's metadata

It doesn't need to, that is out of the scope of pingback. To obtain metadata out of a URI, you already have many methods: for HTML and XHTML documents there are <title> and <meta> elements and lang attributes. For PNG images there are metadata chunks with embedded RDF. And so on. There's simply no reason for pingback to add yet another method of obtaining metadata out of documents when so many exist already.

Static Web log users can't benefit from pingback

Not true! Thanks to the pingback proxies even users of static sites can use pingback, converting simple pingbacks into e-mails, referrers, or even trackbacks.

Why is pingback using old technologies such as HTTP instead of new technologies such as jabber?

HTTP and XML-RPC are the most appropriate technologies for the task. Pingback is a state-less transaction, not an online presence or a one-way instant message. HTTP is established and implemented, and it works. Jabber is new, experimental, and not very widely spread (I use it, mind you, my jabber ID is ian@hixie.ch just like my e-mail address). Good language design tries to avoid jumping on bandwagons, sticking to the appropriate protocols for each task.

Trackback works.

Trackback has many limitations in its design, as described above. For many cases, it simply doesn't work.

If pingback's so great, where is the demonstrable code?

Well, there are over a dozen active pingback-enabled Web logs on the Web at the moment, including this one. There is Free code available as well, the links are in the spec.

How about making it two way?

We are currently working on a way of fetching pingbacks, expect to see it in the 2.0 spec. At the moment, if you want to implement it, use pingback.extensions.getPingbacks(), no arguments, return value is an array of strings. Try it on this site if you like, it is supported. You can also use that method on the server behind the spec itself to see who has pingbacked the spec.

I almost certainly missed some questions, in which case, please let me know. I'll hopefully be turning this into a FAQ at some point.

In conclusion, I think trackback needs a better defined specification, and has many design flaws which make it inappropriate as a pingback competitor. In the interests of letting the trackback and pingback communities interact, though, I've implemented two scripts that act as gateways to and from trackback and pingback. I'm using the trackback-to-pingback proxy on this site to enable trackback-enabled sites to link to me too. Feedback welcome!

Pingbacks: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35