My arch nemesis over the last couple of weeks has been the very poorly documented Hebrew Traditional Numbering System.
It is insanely complicated.
It is based on addition. For each group of three decimal digits, numbers
are selected from a list (which, by the way, is not sequential). There is a digit for each unit number (1 to 9), one for each
multiple of ten (10-90) and one for each of the first four hundreds (100-400). First complication: to make
numbers higher that 499, you have to use combinations of the hundreds. For example 915 is 400, 400, 100, 10, 5,
which is written as תתקוט.
Except it's not. Because numbers 15 and 16 are too close to the Tetragrammaton (the four-letter name of God).
So 15 and 16, instead of being written as 10+5 and 10+6 respectively, are written as 9+6 and 9+7. So 915 is
תתקטו. If you are not familiar with bidirectional text, you might think that the
first two numbers have been changed instead of the last two, but that just brings up another complication... Hebrew
is written right to left, so you have to read UNICODE code points left to right and compare them to characters going
right to left.
Of course it doesn't end there. If you are paying attention you'll have noticed there is no number for zero. This
makes writing numbers like 1000016 rather hard. So the word for thousand (אלפי) is used
instead. Except that it is repeated the number of times required to get to the group which had the zero — in
this case twice, since the 1 is a thousand thousand (otherwise known as a million). So 1000016 is
א אלפי אלפי יו.
Oops, forgot about the issue with 16. Oh and there's another problem. The last occurance of the "thousand" word in each
chain of such words has to have a special letter added at the end. So it is:
א אלפי אלפים טז.
Except it's not. 1000 is a special case, you see. It is gramatically incorrect to just stick the word for "thousand"
after the number for one. You have to use a special form for it instead. Which makes it
אלף אלפים טז.
There's a similar rule for 2000. Except that it only applies at the end of a word, not if it followed by more zeros. So two million is ב אלפי אלפים
but two thousand is אלפיים.
Oh, one more thing. All of those numbers are wrong because I didn't add special characters indicating they were
numbers, not words. You add one special punctuation character to each group of one character, and another to groups of more than one character. The second of these, though, doesn't go at the end of the group it goes just before the last character. And you never add these characters to groups consisting of the word for thousands. You have to add these characters if they are used in prose, but musn't if they are used in lists.
And if you think that is complicated, wait til you hear about the (thankfully optional) further reorderings
that can be made, such as 298 being written as 200+8+90 to avoid spelling the word for murder.
I wrote a script to convert numbers
into the hebrew numbering system. I think it works for all numbers from zero up to some high number (probably 232).
I also wrote, with a lot of help from Simon Montagu, pretty detailed documentation for all the rules (including codepoints)
that apply to this ridiculously complicated numbering system. It will be in the next public draft of the
CSS3 Lists Module.
It seems pingback has caused quite a stir in the Web logging and syndication communities! The spec is barely a week old and already
I'm seeing pingbacks on sites of people I've never heard of, so implementations are spreading, which is great. It also seems pingback
has acted a little like a kick in the backside to the trackback folk, causing them to work on the transparency side of trackback,
which is good to see too. (Ironically the one place which did not mention pingback at all is the trackback
development Web log.)
There have been many questions asked and assertions made about pingback, and I'm going to try to answer them.
- Referrers are enough
- I think the best answer to this might be a
typical automated referrer list. Look at the number of redundancies, of vague URIs (most referrers are in fact home pages,
not permalinks, so the referrer list quickly becomes useless), and of pointless referrers (such as those from webmail systems).
Referrers are very noisy (spiders include bogus referrers, people come to
sites with only subtly different referrers such as a trailing slash or no
trailing slash, etc), referrers often don't give you the permalink of the
originating resource, and referrers contain many links from news aggregators,
blogrolls, search engines and other pages which do not contain comments
on the post. On the other hand, pingbacks are reliable because they are explicitly requested, which gives them
a near perfect signal to noise ratio (they are only triggered by
pages that actually link to the post with comments), and the pingback mechanism
is easily extended to provide services such as pingback listing (when a
site asks another for the list of pingbacks — see below for ideas on this).
- Pingback, Trackback and Referrers are pointless
- Well then don't use them. Personally I love finding that other people have commented about my post (every
pingback, trackback and new referrer I receive automatically sends me an e-mail). I don't like having comments
on my Web log because I believe if you want to comment then you should do so on your own Web log. If I'd wanted
to host a discussion forum, I'd have installed discussion forum software, not Web logging software!
- Trackback is simpler:
In the trackBack model, the
client basically does all of the work of auto-discovery and mapping a permalink to a ping URL; in the
pingback model, the server does the work.
- This is fundamentally incorrect.
In the pingback model, the client does the work of finding the pingback server, and then invokes a simple XML-RPC
call to that server with the two URIs. In the Trackback model, the client has to do all the work of finding the
trackback server, which includes mapping the permalink to the trackback ID by parsing some RDF. The client then has to call the server
using the constructed URI, which then (depending on the implementation) has to map this ID back to the permalink.
- Calling pingback's
transparency a benefit over trackback is misleading
- Having now found out a lot more about trackback, I would agree. There are quite a few
much more important benefits which I would call out instead:
- Simpler autodiscovery
- To detect a pingback server, you have some
very simple rules to follow:
You look for a particular HTTP header, and failing that, you follow an explicit
algorithm to find the URI of the pingback server in the content.
The TrackBack Technical Documentation document
doesn't say exactly how one is to discover a trackback server, but hints that one should maybe search
through the target for an RDF fragment, which one then has to parse (this is rather non-trivial given the
flexibility of RDF).
- Greater flexibility
- You can pingback any document, even plain text files, images, and files which do not know that pingback exists, so long as
the document can be served with a pingback HTTP header. Due to its use of RDF, trackback can only correctly be applied to XHTML
documents sent as text/xml, application/xml, or (maybe at a pinch) application/xhtml+xml, and, if you are ready to violate the standards,
to HTML or XHTML documents sent as text/html. It cannot be easily applied to images (you'd have to include a text chunk within the binary
file to store the RDF, and the spec doesn't say how you would find it) and if applied to a plain text document users would be able to
see the contents of the RDF.
- Less redundancy
- Trackback requires you to associate each post with both a permalink and a trackback ID, and uses one for sending
trackbacks and one for receiving them. Pingback uses just permalinks, both for the source and the target.
- No custom languages
- Pingback uses only existing languages for which many high-level toolkits have been written: XML-RPC, HTTP, HTML, etc.
Trackback uses all of these and more, and then invents its own additional language: for responses it uses a custom XML vocabulary, which,
due to the requirement in the spec that implementations be compatible with future extensions, is non-trivial to interpret.
- Easier deployment
- To deploy pingback on an entire site requires nothing more than a single line added to a site configuration file. To deploy trackback
requires that each permalink be associated with a numeric ID and that each document have an individually customised block of RDF
added that uses this ID and duplicates the post's metadata.
- Well defined
- The trackback specification gives no rules for how to handle multiple or contradictory RDF fragments and invalid RDF fragments,
it has no well defined error codes, it doesn't define the
autodiscovery mechanism, etc. The pingback spec defines all of these things in detail.
- Sensible semantics in autodiscovery
- The trackback RDF actually implies that the trackback server has the title and author of the post, when in fact the
opposite is the case (the post is the data and the trackback server the metadata, not the other way around — in fact
there is nothing about the trackback RDF to even say that it is anything to do with trackback!). This is an
unfortunate abuse of the Dublin Core vocabulary. Pingback was carefully designed to not contain such semantic errors.
- Sensible semantics in reporting
- To report a trackback, you have to use an HTTP GET request. This is a violation of the HTTP specification, which
says that GET requests must be idempotent. Pingback uses a POST, which is defined as being the appropriate method
to use to make requests and affect changes on remote servers.
- More standards compliant
- In addition to requiring violations of the Dublin Core and HTTP specifications, the trackback system requires embedding RDF within
the document, which is invalid in HTML, invalid in XHTML-sent-as-text/html, and invalid in strict XHTML-sent-as-text/xml. (Note: I am
not saying it is invalid per the W3C validator, I'm saying it is invalid per the specs.)
Pingback was carefully designed to not require any violation of existing standards (it extends HTTP, HTML and XML-RPC in the
ways recommended by those specifications).
- Not site specific
- I use the same pingback server, with exactly the same HTTP header, for www.hixie.ch,
www.damowmow.com, and my half a dozen other sites. Trackback has to be configured individually
for each server (mainly due to the whole ID problem).
- Note to self: Read the pingback spec. Form opinion.
- What's the verdict, Dave?
- Pingback doesn't define how to get hold of the page's metadata
- It doesn't need to, that is out of the scope of pingback. To obtain metadata out of a URI, you already have many
methods: for HTML and XHTML documents there are
<title>
and <meta>
elements and
lang
attributes. For PNG images there are metadata chunks with embedded RDF. And so on. There's simply no reason
for pingback to add yet another method of obtaining metadata out of documents when so many exist already.
- Static Web log users can't benefit from pingback
- Not true! Thanks to the pingback proxies even users of static sites can use pingback, converting simple pingbacks into
e-mails, referrers, or even trackbacks.
- Why is pingback using old technologies such as HTTP instead of new
technologies such as jabber?
- HTTP and XML-RPC are the most appropriate technologies for the task. Pingback is a state-less transaction, not an online presence
or a one-way instant message. HTTP is established and implemented, and it works. Jabber is new, experimental, and not very
widely spread (I use it, mind you, my jabber ID is
ian@hixie.ch
just like my e-mail address). Good language design
tries to avoid jumping on bandwagons, sticking to the appropriate protocols for each task.
- Trackback works.
- Trackback has many limitations in its design, as described above. For many cases, it simply doesn't work.
- If pingback's so great, where is the demonstrable code?
- Well, there are over a dozen active pingback-enabled Web logs on the Web at the moment, including this one. There is Free code
available as well, the links are in the spec.
- How about making it two way?
- We are currently working on a way of fetching pingbacks, expect to see it in the 2.0 spec. At the moment, if you want to implement it,
use
pingback.extensions.getPingbacks()
, no arguments, return value is an array of strings. Try it on this site if you like,
it is supported. You can also use that method on the server behind the spec itself to see who has pingbacked the spec.
I almost certainly missed some questions, in which case, please let me know. I'll hopefully
be turning this into a FAQ at some point.
In conclusion, I think trackback needs a better defined specification, and has many design flaws which make it inappropriate as a
pingback competitor. In the interests of letting the trackback and pingback communities interact, though,
I've implemented two scripts
that act as gateways to and from trackback and pingback. I'm using the trackback-to-pingback proxy on this site to enable trackback-enabled sites to link to me too. Feedback welcome!
Pingbacks:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
Swimfan has a point to make ("don't cheat on your girlfriend, no really, we mean it, don't cheat on her") and it makes no secret of this fact.
Technically, it wasn't too bad. The plot did hold together, the acting was of a reasonable quality. The editing was a little too in-you-face for my liking, but I got used to it. It just wasn't a particularly exciting movie.