Dochula Pass, Bhutan

>> Try it !

Picture of the page in action.

This tool allows you to normalise short pieces of text to Unicode forms NFC or NFD. You can paste the relevant text into a text area, or append it to the uri that calls the page, eg. Vietnamese example.

Note that, although I spell normalisation in the British way in this post, the uri uses the American spelling, since I suspect most users of the tool will expect it to be spelt that way.

Wondering what normalisation is? In Unicode a letter like á can be represented by a (precomposed) single character or by an a followed by an acute accent (a decomposed sequence). Unicode regards these two representations as formally equivalent. If you are comparing strings, therefore, you need to know which representations are equivalent. Usually you would want to normalise your text prior to comparison to a given normalisation form, so that the comparison process can be efficient. Unicode defines four normalization forms, two of which, NFD and NFC, are handled by this tool.

Basically NFD reduces all precomposed characters to their decomposed equivalents, whereas NFC uses precomposed characters for most common situations.

>> See what it can do !

>> Use it !

Picture of the page in action.

The major changes in this version relate to the way searching and property-based lookup is done on characters in the lower left panel, and features for refining and capturing the resulting lists.

Removed the two Highlight selection boxes. These used to highlight characters in the lower left panel with a specific property value. The Show selection box on the left (used to be Show list) now does that job if you set the Local checkbox alongside it. (Local is the default for this feature.)

As part of that move, the former SiR (search in range) checkbox that used to be alongside Custom range has been moved below the Search for input field, and renamed to Local. If Local is checked, searching can now be done on any content in the lower left panel, and the results are shown as highlighting, rather than a new list.

To complement these new highlighting capabilities, a new feature was added. If you click on the icon next to Make list from highlights the content of the lower left panel will be replaced by a list of just those items that are currently highlighted – whether the highlighting results from a search or a property listing. Note that this can also be useful to refine searches: perform an initial search, convert the result to a list, then perform another search on that list, and so on.

Finally got around to putting  icons after the pull-down lists. This means that if you want to reapply, say, a block selection after doing something else, only one click is needed (rather than having to choose another option, then choose the original option). The effect of this on the ease of use of UniView is much greater than I expected.

Added an icon  to the text area. If you click on this, all the characters in the lower left panel are copied into the text area. This is very useful for capturing the result of a search, or even a whole block. Note that if a list in the lower left panel contains unassigned code points, these are not copied to the text area.

As a result of the above changes, the way Show as graphics and Show range as list work internally was essential rewritten, but users shouldn’t see the difference.

Changed the label Character area to Text area.

>> See what it can do !

>> Use it !

Picture of the page in action.

The main change in this version is the reworking of the former Cut & paste and Code point(s) fields to make it easier to use UniView as a generalised picker.

Moved the cut&paste field downwards, made it larger, and changed the label to character area. This should make it easier to deal with text copy/cut & paste, and more obvious that that is possible with UniView. It is much clearer now that UniView provides character map/picker functionality, and not just character lookup.

Whereas previously you had to double-click to put a character in the lower left pane into the Cut&paste field, UniView now echoes characters to the Character area every time you (single) click on a character in the lower left hand pane. This can be turned off. Double-clicking will still add the codepoint of a character in the lower left panel to the Code points field.

The Character area has its own set of icons, some of which are new: ie. you can select the text, add a space, and change the font of the text in the area (as well as turn the echo on and off). I also spruced up the icons on the UI in general.

Note that on most browsers you can insert characters at the point in the Character area where you set the cursor, or you can overwrite a highlight range of characters, whereas (because of the non-standard way it handles selections and ranges) Internet Explorer will always add characters to the end of the line.

The Code points field has also been enlarged, and I moved the Show list pull-down to the left and Show as graphics and Show page as list to the right. This puts all the main commands for creating lists together on the left.

When you mouse over character in the lower left pane you now see both hex and decimal codepoint information. (Previously you just saw an unlabelled decimal number.) You will also find decimal code point values for characters displayed in the lower right panel.

Fixed a bug in the Code points input feature so that trailing spaces no longer produce errors, but also went much further than that. You can now add random text containing codepoints or most types of hex-based escaped characters to the input field, and UniView will seek them out to create the list. For example, if you paste the following into the Code points field:

the decomposition mapping is <U+CE20, U+11B8>, and not <U+110E, U+1173, U+11B8>.

the result will be:

CE20: 츠 [Hangul Syllables]
11B8: ᆸ HANGUL JONGSEONG PIEUP
110E: ᄎ HANGUL CHOSEONG CHIEUCH
1173: ᅳ HANGUL JUNGSEONG EU
11B8: ᆸ HANGUL JONGSEONG PIEUP

Of course, UniView is not able to tell that an ordinary word like ‘Abba’ is not a hex codepoint, so you obviously need to watch out for that and a few other situations, but much of the time this should make it much easier to extract codepoint information.

I still haven’t found a way to fix the display bug in Safari and Google Chrome that causes initial content in the lower left pane to be only partially displayed.