loge.hixie.ch

Hixie's Natural Log

2002-10-02 02:12 UTC My faceless enemy has been defeated

My arch nemesis over the last couple of weeks has been the very poorly documented Hebrew Traditional Numbering System.

It is insanely complicated.

It is based on addition. For each group of three decimal digits, numbers are selected from a list (which, by the way, is not sequential). There is a digit for each unit number (1 to 9), one for each multiple of ten (10-90) and one for each of the first four hundreds (100-400). First complication: to make numbers higher that 499, you have to use combinations of the hundreds. For example 915 is 400, 400, 100, 10, 5, which is written as תתקוט.

Except it's not. Because numbers 15 and 16 are too close to the Tetragrammaton (the four-letter name of God). So 15 and 16, instead of being written as 10+5 and 10+6 respectively, are written as 9+6 and 9+7. So 915 is תתקטו. If you are not familiar with bidirectional text, you might think that the first two numbers have been changed instead of the last two, but that just brings up another complication... Hebrew is written right to left, so you have to read UNICODE code points left to right and compare them to characters going right to left.

Of course it doesn't end there. If you are paying attention you'll have noticed there is no number for zero. This makes writing numbers like 1000016 rather hard. So the word for thousand ‏(אלפי)‏ is used instead. Except that it is repeated the number of times required to get to the group which had the zero — in this case twice, since the 1 is a thousand thousand (otherwise known as a million). So 1000016 is א אלפי אלפי יו. Oops, forgot about the issue with 16. Oh and there's another problem. The last occurance of the "thousand" word in each chain of such words has to have a special letter added at the end. So it is: א אלפי אלפים טז.

Except it's not. 1000 is a special case, you see. It is gramatically incorrect to just stick the word for "thousand" after the number for one. You have to use a special form for it instead. Which makes it אלף אלפים טז.

There's a similar rule for 2000. Except that it only applies at the end of a word, not if it followed by more zeros. So two million is ב אלפי אלפים but two thousand is אלפיים.

Oh, one more thing. All of those numbers are wrong because I didn't add special characters indicating they were numbers, not words. You add one special punctuation character to each group of one character, and another to groups of more than one character. The second of these, though, doesn't go at the end of the group it goes just before the last character. And you never add these characters to groups consisting of the word for thousands. You have to add these characters if they are used in prose, but musn't if they are used in lists. And if you think that is complicated, wait til you hear about the (thankfully optional) further reorderings that can be made, such as 298 being written as 200+8+90 to avoid spelling the word for murder.

I wrote a script to convert numbers into the hebrew numbering system. I think it works for all numbers from zero up to some high number (probably 232). I also wrote, with a lot of help from Simon Montagu, pretty detailed documentation for all the rules (including codepoints) that apply to this ridiculously complicated numbering system. It will be in the next public draft of the CSS3 Lists Module.

2002-09-28 00:05 UTC Whitepaper: Pingback vs Trackback

It seems pingback has caused quite a stir in the Web logging and syndication communities! The spec is barely a week old and already I'm seeing pingbacks on sites of people I've never heard of, so implementations are spreading, which is great. It also seems pingback has acted a little like a kick in the backside to the trackback folk, causing them to work on the transparency side of trackback, which is good to see too. (Ironically the one place which did not mention pingback at all is the trackback development Web log.)

There have been many questions asked and assertions made about pingback, and I'm going to try to answer them.

Referrers are enough
I think the best answer to this might be a typical automated referrer list. Look at the number of redundancies, of vague URIs (most referrers are in fact home pages, not permalinks, so the referrer list quickly becomes useless), and of pointless referrers (such as those from webmail systems). Referrers are very noisy (spiders include bogus referrers, people come to sites with only subtly different referrers such as a trailing slash or no trailing slash, etc), referrers often don't give you the permalink of the originating resource, and referrers contain many links from news aggregators, blogrolls, search engines and other pages which do not contain comments on the post. On the other hand, pingbacks are reliable because they are explicitly requested, which gives them a near perfect signal to noise ratio (they are only triggered by pages that actually link to the post with comments), and the pingback mechanism is easily extended to provide services such as pingback listing (when a site asks another for the list of pingbacks — see below for ideas on this).
Pingback, Trackback and Referrers are pointless
Well then don't use them. Personally I love finding that other people have commented about my post (every pingback, trackback and new referrer I receive automatically sends me an e-mail). I don't like having comments on my Web log because I believe if you want to comment then you should do so on your own Web log. If I'd wanted to host a discussion forum, I'd have installed discussion forum software, not Web logging software!
Trackback is simpler: In the trackBack model, the client basically does all of the work of auto-discovery and mapping a permalink to a ping URL; in the pingback model, the server does the work.
This is fundamentally incorrect. In the pingback model, the client does the work of finding the pingback server, and then invokes a simple XML-RPC call to that server with the two URIs. In the Trackback model, the client has to do all the work of finding the trackback server, which includes mapping the permalink to the trackback ID by parsing some RDF. The client then has to call the server using the constructed URI, which then (depending on the implementation) has to map this ID back to the permalink.
Calling pingback's transparency a benefit over trackback is misleading
Having now found out a lot more about trackback, I would agree. There are quite a few much more important benefits which I would call out instead:
Simpler autodiscovery
To detect a pingback server, you have some very simple rules to follow: You look for a particular HTTP header, and failing that, you follow an explicit algorithm to find the URI of the pingback server in the content. The TrackBack Technical Documentation document doesn't say exactly how one is to discover a trackback server, but hints that one should maybe search through the target for an RDF fragment, which one then has to parse (this is rather non-trivial given the flexibility of RDF).
Greater flexibility
You can pingback any document, even plain text files, images, and files which do not know that pingback exists, so long as the document can be served with a pingback HTTP header. Due to its use of RDF, trackback can only correctly be applied to XHTML documents sent as text/xml, application/xml, or (maybe at a pinch) application/xhtml+xml, and, if you are ready to violate the standards, to HTML or XHTML documents sent as text/html. It cannot be easily applied to images (you'd have to include a text chunk within the binary file to store the RDF, and the spec doesn't say how you would find it) and if applied to a plain text document users would be able to see the contents of the RDF.
Less redundancy
Trackback requires you to associate each post with both a permalink and a trackback ID, and uses one for sending trackbacks and one for receiving them. Pingback uses just permalinks, both for the source and the target.
No custom languages
Pingback uses only existing languages for which many high-level toolkits have been written: XML-RPC, HTTP, HTML, etc. Trackback uses all of these and more, and then invents its own additional language: for responses it uses a custom XML vocabulary, which, due to the requirement in the spec that implementations be compatible with future extensions, is non-trivial to interpret.
Easier deployment
To deploy pingback on an entire site requires nothing more than a single line added to a site configuration file. To deploy trackback requires that each permalink be associated with a numeric ID and that each document have an individually customised block of RDF added that uses this ID and duplicates the post's metadata.
Well defined
The trackback specification gives no rules for how to handle multiple or contradictory RDF fragments and invalid RDF fragments, it has no well defined error codes, it doesn't define the autodiscovery mechanism, etc. The pingback spec defines all of these things in detail.
Sensible semantics in autodiscovery
The trackback RDF actually implies that the trackback server has the title and author of the post, when in fact the opposite is the case (the post is the data and the trackback server the metadata, not the other way around — in fact there is nothing about the trackback RDF to even say that it is anything to do with trackback!). This is an unfortunate abuse of the Dublin Core vocabulary. Pingback was carefully designed to not contain such semantic errors.
Sensible semantics in reporting
To report a trackback, you have to use an HTTP GET request. This is a violation of the HTTP specification, which says that GET requests must be idempotent. Pingback uses a POST, which is defined as being the appropriate method to use to make requests and affect changes on remote servers.
More standards compliant
In addition to requiring violations of the Dublin Core and HTTP specifications, the trackback system requires embedding RDF within the document, which is invalid in HTML, invalid in XHTML-sent-as-text/html, and invalid in strict XHTML-sent-as-text/xml. (Note: I am not saying it is invalid per the W3C validator, I'm saying it is invalid per the specs.) Pingback was carefully designed to not require any violation of existing standards (it extends HTTP, HTML and XML-RPC in the ways recommended by those specifications).
Not site specific
I use the same pingback server, with exactly the same HTTP header, for www.hixie.ch, www.damowmow.com, and my half a dozen other sites. Trackback has to be configured individually for each server (mainly due to the whole ID problem).
Note to self: Read the pingback spec. Form opinion.
What's the verdict, Dave?
Pingback doesn't define how to get hold of the page's metadata
It doesn't need to, that is out of the scope of pingback. To obtain metadata out of a URI, you already have many methods: for HTML and XHTML documents there are <title> and <meta> elements and lang attributes. For PNG images there are metadata chunks with embedded RDF. And so on. There's simply no reason for pingback to add yet another method of obtaining metadata out of documents when so many exist already.
Static Web log users can't benefit from pingback
Not true! Thanks to the pingback proxies even users of static sites can use pingback, converting simple pingbacks into e-mails, referrers, or even trackbacks.
Why is pingback using old technologies such as HTTP instead of new technologies such as jabber?
HTTP and XML-RPC are the most appropriate technologies for the task. Pingback is a state-less transaction, not an online presence or a one-way instant message. HTTP is established and implemented, and it works. Jabber is new, experimental, and not very widely spread (I use it, mind you, my jabber ID is ian@hixie.ch just like my e-mail address). Good language design tries to avoid jumping on bandwagons, sticking to the appropriate protocols for each task.
Trackback works.
Trackback has many limitations in its design, as described above. For many cases, it simply doesn't work.
If pingback's so great, where is the demonstrable code?
Well, there are over a dozen active pingback-enabled Web logs on the Web at the moment, including this one. There is Free code available as well, the links are in the spec.
How about making it two way?
We are currently working on a way of fetching pingbacks, expect to see it in the 2.0 spec. At the moment, if you want to implement it, use pingback.extensions.getPingbacks(), no arguments, return value is an array of strings. Try it on this site if you like, it is supported. You can also use that method on the server behind the spec itself to see who has pingbacked the spec.

I almost certainly missed some questions, in which case, please let me know. I'll hopefully be turning this into a FAQ at some point.

In conclusion, I think trackback needs a better defined specification, and has many design flaws which make it inappropriate as a pingback competitor. In the interests of letting the trackback and pingback communities interact, though, I've implemented two scripts that act as gateways to and from trackback and pingback. I'm using the trackback-to-pingback proxy on this site to enable trackback-enabled sites to link to me too. Feedback welcome!

Pingbacks: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34

2002-09-26 10:25 UTC Fire in the hole!!!

I highly recommend taking cover behind a nice big safe rock, then pulling up a chair and watching the ensuing battle that this TAG e-mail will almost certainly trigger.

Pingbacks: 1

2002-09-24 12:41 UTC Grumble

Sore throat. Grr.

2002-09-23 22:32 UTC Movie: Swimfan

Swimfan has a point to make ("don't cheat on your girlfriend, no really, we mean it, don't cheat on her") and it makes no secret of this fact.

Technically, it wasn't too bad. The plot did hold together, the acting was of a reasonable quality. The editing was a little too in-you-face for my liking, but I got used to it. It just wasn't a particularly exciting movie.