loge.hixie.ch

Hixie's Natural Log

2002-10-02 02:12 UTC My faceless enemy has been defeated

My arch nemesis over the last couple of weeks has been the very poorly documented Hebrew Traditional Numbering System.

It is insanely complicated.

It is based on addition. For each group of three decimal digits, numbers are selected from a list (which, by the way, is not sequential). There is a digit for each unit number (1 to 9), one for each multiple of ten (10-90) and one for each of the first four hundreds (100-400). First complication: to make numbers higher that 499, you have to use combinations of the hundreds. For example 915 is 400, 400, 100, 10, 5, which is written as תתקוט.

Except it's not. Because numbers 15 and 16 are too close to the Tetragrammaton (the four-letter name of God). So 15 and 16, instead of being written as 10+5 and 10+6 respectively, are written as 9+6 and 9+7. So 915 is תתקטו. If you are not familiar with bidirectional text, you might think that the first two numbers have been changed instead of the last two, but that just brings up another complication... Hebrew is written right to left, so you have to read UNICODE code points left to right and compare them to characters going right to left.

Of course it doesn't end there. If you are paying attention you'll have noticed there is no number for zero. This makes writing numbers like 1000016 rather hard. So the word for thousand ‏(אלפי)‏ is used instead. Except that it is repeated the number of times required to get to the group which had the zero — in this case twice, since the 1 is a thousand thousand (otherwise known as a million). So 1000016 is א אלפי אלפי יו. Oops, forgot about the issue with 16. Oh and there's another problem. The last occurance of the "thousand" word in each chain of such words has to have a special letter added at the end. So it is: א אלפי אלפים טז.

Except it's not. 1000 is a special case, you see. It is gramatically incorrect to just stick the word for "thousand" after the number for one. You have to use a special form for it instead. Which makes it אלף אלפים טז.

There's a similar rule for 2000. Except that it only applies at the end of a word, not if it followed by more zeros. So two million is ב אלפי אלפים but two thousand is אלפיים.

Oh, one more thing. All of those numbers are wrong because I didn't add special characters indicating they were numbers, not words. You add one special punctuation character to each group of one character, and another to groups of more than one character. The second of these, though, doesn't go at the end of the group it goes just before the last character. And you never add these characters to groups consisting of the word for thousands. You have to add these characters if they are used in prose, but musn't if they are used in lists. And if you think that is complicated, wait til you hear about the (thankfully optional) further reorderings that can be made, such as 298 being written as 200+8+90 to avoid spelling the word for murder.

I wrote a script to convert numbers into the hebrew numbering system. I think it works for all numbers from zero up to some high number (probably 232). I also wrote, with a lot of help from Simon Montagu, pretty detailed documentation for all the rules (including codepoints) that apply to this ridiculously complicated numbering system. It will be in the next public draft of the CSS3 Lists Module.