I’ve been lucky enough to have access to a pre-publication electronic version of the new Unicode Standard 5 book, and though I’ve been terribly busy just lately, I’ve carved out a little time to read and even use some of it. And I like what I see.
I’ve always thought the Unicode book was a really useful thing to have if you need to understand the ins-and-outs of Unicode for implementation purposes, or if you are simply interested in how scripts work. It has always been relatively easy to read, and more like a guidebook than a standard, if you know what I mean. The good news is that that seems to be even more the case in the latest version. There are lots of small edits that improve the clarity of the text and make it more readable.
There are, however, some more significant changes that are also very welcome. For example, I’ve been looking at first-letter styling in CSS recently, particularly in the context of Indian scripts, but despite a lot of searching I was unable to figure out where the Standard actually told me that a default grapheme cluster didn’t cover a whole Indic syllable. The grapheme cluster concept is really quite an important one for implementations, and it was frustrating to see it described so poorly.
All that has changed with extensive additions to Chapter 3. Now section 3.6 Combination contains a substantial amount of new text that explains grapheme clusters quite clearly. Again, don’t be put off by the dour-sounding title for Chapter 3, Conformance. It contains lots of useful definitions and explanations in the typical clear and succinct style of the book.
I have to admit to a tinge of disappointment that the Standard Annexes which are now included in the book have simply been added as appendices, rather than integrated into the text proper. My evaluation copy didn’t actually contain this text, so I can’t comment further, however.
Also, I had decided a short while ago that I need to finally get to grips with Tibetan script, and some urgency has been added to that given that I will visit Bhutan in January. I was disappointed, therefore, to find that the section on Tibetan script had not been edited at all. That section has always been substandard, to my mind, in terms of clarity and writing style.
On the other hand, I see that useful additions have been made to existing block descriptions elsewhere (such as a useful additional section on Rendering of Thai Combining Marks in the Thai description). I see similar additions to block descriptions such as Lao, Gujarati and Gurmukhi, and the Bengali block description seems to have been largely rewritten. I’m looking forward to getting my teeth into those and also the numerous, enticing new block descriptions, such as Phags-pa, N’ko, Sumer-Akkadian (cuneiform) and the like.
So would I recommend it? Certainly. The Unicode Standard is a mine of useful and accessible information if, as I said, you are implementing Unicode-based applications or you are interested in how scripts work. And it’s worth replacing your previous version, not only because the new smaller format will make it much easier to handle and keep on your bookshelf, but because of the value of the many useful additions. I’ll be picking up my copy at the Unicode Conference in Washington next month.
« Unicode Code Converter v4 – Tibetan character picker »

November 7th, 2006 at 12:43 pm
I received a nice note from Julie Allen, Senior Editor and Project Manager, Unicode, Inc. which contained the following information:
“Thanks for your very nice review of Unicode 5.0. We appreciate your support. As the editor, I especially appreciated your kind words about the clarity and readability of the text. We work hard on that. And I’m glad you noted the many script descriptions and other parts we rewrote.”
“Of course, you’ll see the book for yourself soon, but I wanted to comment on the integration of the annexes and your comment about Tibetan. I think you’ll be pleasantly surprised at how nice the annexes look. It is true that they are not fully integrated into the book, but we worked very hard to edit them and get them to the same level of quality as the rest of the book. Because they are still published both as HTML on the web and in the book, this process was relatively complicated. We also decided to add the annexes very late in the publication schedule. While it would have been ideal to index them as part of the main book, this would have been impossible to do without throwing off the publication schedule, which was extremely tight even before adding the annexes. Overall, I think the result turned out well. (BTW, Asmus was the one who handled the annexes, and he spent considerable time on them.)”
“You’re right that Tibetan hasn’t changed from 4.0 and is probably not our strongest script description. Tibetan was an extremely difficult script to reach consensus on with the various groups who helped draft the description. If memory serves, it was one of the last pieces of text we completed for 4.0, so it probably got less scrutiny than some others that we completed earlier. If someone steps forward, maybe we can improve the text next time.”
November 29th, 2006 at 8:02 pm
I just got my (printed) copy. At nearly 6lb, it is a pretty hefty book
I browsed a little, and it looks good. But I always thought that writing systems are both beautiful and fascinating.
I’m looking forward to the weekend when I’ll be able to do some serious reading. Your review just makes me want to get there faster…
January 7th, 2007 at 12:22 am
I’d just like to thank you for taking the time to create this internet website. It has been extremely helpful