Dochula Pass, Bhutan

>> See what it can do !

>> Use it !

Picture of the page in action.

The major changes in this version include a new feature to normalise text as NFC or NFD, the ability to accept decimal code point values, and an overhaul of top part of the user interface.

Added buttons to the Text area to allow conversion of the text to NFC or NFD normalization forms. (You may not notice the change until you list the characters.)

The control panel was also substantially rearranged again to hopefully make it easier for newcomers to see what they can do.

The Code point conversion feature was upgraded to handle decimal code point values.

A single character in the codepoints area or text area is now listed in the lower left panel when you click on  , rather than in the right-hand properties panel. This is to improve consistency and avoid surprises.

Added a link to the CLDR property demo from the right panel to give access to additional properties.

Improved the parsing of codepoints when surrounded by text in the Code point input field, so that it now works with &#x…; and \u… and \U… escapes.

Jettisoned some unneeded code to reduce download by around 40-50K bytes. Implemented the NFC/NFD feature using AJAX, to avoid putting the download size back up.

When you delete the contents of the text area or the code point area, the associated input field is given focus, so you are ready for input.

A couple more minor bug fixes.

I was asked to make available the code for my normalization functions in JavaScript and PHP. The links are below. I’m making the code available under a Creative Commons Attribution-Noncommercial-Share Alike licence.

Disclaimers Note that I make no claim to have produced polished, compact or well-optimised code! The code does what I need, and I’m happy with that. You are welcome to suggest improvements, and I’m sure there are many that could be made.

As they say, this code is made available in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose.

The code is a little more convoluted that it ought to be, to get around the fact that JavaScript doesn’t understand supplementary characters, and PHP just doesn’t naturally understand Unicode. (How I long for PHP6.)

Update: [[I meant to mention that there is a way of doing normalization in PHP already. I made this code available just because I had it. I created it as a learning exercise. It may be useful, however, if you are unable to load the ICU and intl packages onto your server.]]

To use the code, simply call nfc('your-text-string') or nfd('your-text-string') from your code and capture the result.

For PHP you’ll need these routines and this data.

For JavaScript look at these routines and this data. There is also a lite version of the data file that doesn’t include Han characters. I use this sometimes for bandwidth savings (about 14K less).

Test files I also created some test files for PHP and for JavaScript.
Both of these expect to find a copy of in the local directory. These files run 71,076 tests.

Cautions Be careful about the editor you use for the data files. I spent several hours fruitlessly debugging the routines, only to find that Notepad++ was displaying certain supplementary characters ok, but corrupting them on save. I switched to Notepad and the problem evaporated. And I probably don’t need to add that editing the data files in something like DreamWeaver is a bad idea because it will probably normalize the data before saving.

Another point: you may see Unicode replacement characters at a couple of points in the PHP source. These represent the first and last characters in the high surrogate range.

Experimenting If you want to play with something that uses this you could try my Tłįchǫ (Dogrib) character picker, or my Normalizer tool. I will slowly fit this to all the pickers and to UniView. I have a local version of UniView waiting in the wings that uses the PHP files via AJAX, to reduce download size. For that you need a file that returns the result as plain text across the wire, such as this.

Well, I hope that that may be of use to someone, somewhere. I hope I haven’t forgotten anything.