Dochula Pass, Bhutan

Picture of the page in action.

A new Persian Character Picker web app is now available. The picker allows you to produce or analyse runs of Persian text using the Arabic script. Character pickers are especially useful for people who don’t know a script well, as characters are displayed in ways that aid identification.

The picker is able to produce UN transcriptions of the text in the box. The transcription appears just below the input box, where you can copy it, move it into the input box at the caret, or delete it. In order to obtain a full transcription it is necessary to add short vowel diactritics to places that could have more than one pronunciation, but the picker can work out the vowels needed for many letter combinations.

See the help file for more information.

Picture of the page in action.

I’ve been doing more work on the Egyptian Hieroglyph picker over the weekend.

The data behind the keyword search has now been completely updated to reflect descriptions by Gardiner and Allen. If you work with those lists it should now be easy to locate hieroglyphs using keywords. The search mechanism has also been rewritten so that you don’t need to type keywords in a particular order for them to match. I also strip out various common function words and do some other optimisation before attempting a match.

The other headline news is the addition of various controls above the text area, including one that will render MdC text as a two-dimensional arrangement of hieroglyphs. To do this, I adapted WikiHiero’s PHP code to run in javascript. You can see an example of the output in the picture attached to this post. If you want to try it, the MdC text to put in the text area is:
anx-G5-zmA:tA:tA-nbty-zmA:tA:tA-sw:t-bit:t-< -zA-ra:.-mn:n-T:w-Htp:t*p->-anx-D:t:N17-!

The result should look like this:

Picture of hieroglyphs.

Other new controls allow you to convert MdC text to hieroglyphs, and vice versa, or to type in a Unicode phonetic transcription and find the hieroglyphs it represents. (This may still need a little more work.)

I also moved the help text from the notes area to a separate file, with a nice clickable picture of the picker at the top that will link to particular features. You can get to that page by clicking on the blue Help box near the bottom of the picker.

Finally, you can now set the text area to display characters from right to left, in right-aligned lines, using more controls > Output direction. Unfortunately, i don’t know of a font that under these conditions will flip the hieroglyphs horizontally so that they face the right way.

For more information about the new features, and how to use the picker, see the Help page.

Picture of the page in action.

Over the weekend I added a set of new features to the picker for Egyptian Hieroglyphs, aimed at making it easier to locate a particular hieroglyph. Here is a run-down of various methods now available.

Category-based input

This was the original method. Characters are grouped into standard categories. Click on one of the orange characters, chosen as a nominal representative of the class, to show below all the characters in that category. Click on one of those to add it to the output box. As you mouse over the orange characters, you’ll see the name of the category appear just below the output box.

Keyword-search-based input

The app associates most hieroglyphs with keywords that describe the glyph. You can search for glyphs using those keywords in the input field labelled Search for.

Searching for ripple will match both ripple and ripples. Searching for king will match king and walking. If you want to only match whole words, surround the search term with colons, ie. :ripple: or :king:.

Note that the keywords are written in British English, so you need to look for sceptre rather than scepter.

The search input is treated as a regular expression, so if you want to search for two words that may have other words between them, use .*. For example, ox .* palm will match ox horns with stripped palm branch.

Many of the hieroglyphs have also been associated with keywords related to their use. If you select Include usage, these keywords will also be selected. Note that this keyword list is not exhaustive by any means, but it may occasionally be useful. For example, a search for Anubis will produce 𓁢 𓃢 𓃣 𓃤 .

(Note: to search for a character based on the Unicode name for that character, eg. w004, use the search box in the yellow area.)

Searching for pronunciations

Many of the hieroglyphs are associated with 1, 2 or 3 consonant pronunciations. These can be looked up as follows.

Type the sequence of consonants into the output box and highlight them. Then click on Look up from Latin. Hieroglyphs that match that character or sequence of characters will be displayed below the output box, and can be added to the output box by clicking on them. (Note that if you still have the search string highlighted in the output box those characters will be replaced by the hieroglyph.)

You will find the panel Latin characters useful for typing characters that are not accessible via your keyboard. The panel is displayed by clicking on the higher L in the grey bar to the left. Click on a character to add it to the output area.

For example, if you want to obtain the hieroglyph 𓎝, which is represented by the 3-character sequence wꜣḥ, add wꜣḥ to the output area and select it. Then click on Latin characters. You will see the character you need just above the SPACE button. Click on that hieroglyph and it will replace the wꜣḥ text in the output area. (Unhighlight the text in the output area if you want to keep both and add the hierglyph at the cursor position.)

Input panels accessed from the vertical grey bar

The vertical grey bar to the left allows you to turn on/off a number of panels that can help create the text you want.

Latin characters. This panel displays Latin characters you are likely to need for transcription. It is particularly useful for setting up a search by pronunciation (see above).

Latin to Egyptian. This panel also displays Latin characters used for transcription, but when you click on them they insert hieroglyphs into the output area. These are 24 hieroglyphs represented by a single consonant. Think of it as a shortcut if you want to find 1-consonant hieroglyphs by pronunciation.

Where a single consonant can be represented by more than one hieroglyph, a small pop-up will present you with the available choices. Just click on the one you want.

Egyptian alphabet. This panel displays the 26 hieroglyphs that the previous panel produces as hieroglyphs. In many cases this is the quickest way of typing in these hieroglyphs.

Picture of the page in action.
>> Use the picker

An update to version 17 of the Mongolian character picker is now available.

When you hover over or select a character in the selection area, the box to the left of that area displays the alternate glyph forms that are appropriate for that character. By default, this only happens when you click on a character, but you can make it happen on hover by clicking on the V in the gray selection bar to the right.

The list includes the default positional forms as well as the forms produced by following the character with a Free Variation Selector (FVS). The latter forms have been updated, based on work which has been taking place in 2015 to standardise the forms produced by using FVS. At the moment, not all fonts will produce the expected shapes for all possible combinations. (For more information, see Notes on Mongolian variant forms.)

An additional new feature is that when the variant list is displayed, you can add an appropriate FVS character to the output area by simply clicking in the list on the shape that you want to see in the output.

This provides an easy way to check what shapes should be produced and what shapes are produced by a given font. (You can specify which font the app should use for display of the output.)

Some small improvements were also made to the user interface. The picker works best in Firefox and Edge desktop browsers, since they now have pretty good support for vertical text. It works least well in Safari (which includes the iPad browsers).

For more information about the picker, see the notes at the bottom of the picker page.

About pickers: Pickers allow you to quickly create phrases in a script by clicking on Unicode characters arranged in a way that aids their identification. Pickers are likely to be most useful if you don’t know a script well enough to use the native keyboard. The arrangement of characters also makes it much more usable than a regular character map utility. See the list of available pickers.

Screen Shot 2015-01-18 at 07.42.56

Version 16 of the Bengali character picker is now available.

Other than a small rearrangement of the selection table, and the significant standard features that version 16 brings, this version adds the following:

  • three new buttons for automatic transcription between latin and bengali. You can use these buttons to transcribe to and from latin transcriptions using ISO 15919 or Radice approaches.
  • hinting to help identify similar characters.
  • the ability to select the base character for the display of combining characters in the selection table.

For more information about the picker, see the notes at the bottom of the picker page.

In addition, I made a number of additions and changes to Bengali script notes (an overview of the Bengali script), and Bengali character notes (an annotated list of characters in the Bengali script).

About pickers: Pickers allow you to quickly create phrases in a script by clicking on Unicode characters arranged in a way that aids their identification. Pickers are likely to be most useful if you don’t know a script well enough to use the native keyboard. The arrangement of characters also makes it much more usable than a regular character map utility. See the list of available pickers.

I’m struggling to show combining characters on a page in a consistent way across browsers.

For example, while laying out my pickers, I want users to be able to click on a representation of a character to add it to the output field. In the past I resorted to pictures of the characters, but now that webfonts are available, I want to replace those with font glyphs. (That makes for much smaller and more flexible pages.)

Take the Bengali picker that I’m currently working on. I’d like to end up with something like this:

comchacon0

I put a no-break space before each combining character, to give it some width, and because that’s what the Unicode Standard recommends (p60, Exhibiting Nonspacing Marks in Isolation). The result is close to what I was looking for in Chrome and Safari except that you can see a gap for the nbsp to the left.

comchacon1

But in IE and Firefox I get this:

comchacon2

This is especially problematic since it messes up the overall layout, but in some cases it also causes text to overlap.

I tried using a dotted circle Unicode character, instead of the no-break space. On Firefox this looked ok, but on Chrome it resulted in two dotted circles per combining character.

I considered using a consonant as the base character. It would work ok, but it would possibly widen the overall space needed (not ideal) and would make it harder to spot a combining character by shape. I tried putting a span around the base character to grey it out, but the various browsers reacted differently to the span. Vowel signs that appear on both sides of the base character no longer worked – the vowel sign appeared after. In other cases, the grey of the base character was inherited by the whole grapheme, regardless of the fact that the combining character was outside the span. (Here are some examples ে and ো.)

In the end, I settled for no preceding base character at all. The combining character was the first thing in the table cell or span that surrounded it. This gave the desired result for the font I had been using, albeit that I needed to tweak the occasional character with padding to move it slightly to the right.

On the other hand, this was not to be a complete solution either. Whereas most of the fonts I planned to use produce the dotted circle in these conditions, one of my favourites (SolaimanLipi) doesn’t produce it. This leads to significant problems, since many combining characters appear far to the left, and in some cases it is not possible to click on them, in others you have to locate a blank space somewhere to the right and click on that. Not at all satisfactory.

comchacon3

I couldn’t find a better way to solve the problem, however, and since there were several Bengali fonts to choose from that did produce dotted circles, I settled for that as the best of a bad lot.

However, then i turned my attention to other pickers and tried the same solution. I found that only one of the many Thai fonts I tried for the Thai picker produced the dotted circles. So the approach here would have to be different. For Khmer, the main Windows font (Daunpenh) produced dotted circles only for some of the combining characters in Internet Explorer. And on Chrome, a sequence of two combining characters, one after the other, produced two dotted circles…

I suspect that I’ll need to choose an approach for each picker based on what fonts are available, and perhaps provide an option to insert or remove base characters before combining characters when someone wants to use a different font.

It would be nice to standardise behaviour here, and to do so in a way that involves the no-break space, as described in the Unicode Standard, or some other base character such as – why not? – the dotted circle itself. I assume that the fix for this would have to be handled by the browser, since there are already many font cats out of the bag.

Does anyone have an alternate solution? I thought I heard someone at the last Unicode conference mention some way of controlling the behaviour of dotted circles via some script or font setting…?

Update: See Marc Durdin’s blog for more on this topic, and his experiences while trying to design on-screen keyboards for Lao and other scripts.

khmer-picker16

I have uploaded a new version of the Khmer character picker.

The new version uses characters instead of images for the selection table, making it faster to load and more flexible. If you prefer, you can still access the previous version.

Other than a small rearrangement of the default selection table to accomodate fonts rather than images, and the significant standard features that version 16 brings, there are no additional changes in this version.

For more information about the picker, see the notes at the bottom of the picker page.

About pickers: Pickers allow you to quickly create phrases in a script by clicking on Unicode characters arranged in a way that aids their identification. Pickers are likely to be most useful if you don’t know a script well enough to use the native keyboard. The arrangement of characters also makes it much more usable than a regular character map utility. See the list of available pickers.

uighur-picker16

devanagari-picker16

gurmukhi-picker16

I have updated the Devanagari picker, the Gurmukhi picker and the Uighur picker to version 16.

You may have spotted a previous, unannounced, version of the Devanagari and Uighur pickers on the site, but essentially these versions should be treated as new. The Gurmukhi picker has been updated from a very old version.

In addition to the standard features that version 16 of the character pickers brings, things to note include the addition of hints for all pickers, and automated transcription from Devanagari to ISO 15919, and vice versa for the Devanagari picker.

For more information about the pickers, see the notes at the bottom of the relevant picker page.

About pickers: Pickers allow you to quickly create phrases in a script by clicking on Unicode characters arranged in a way that aids their identification. Pickers are likely to be most useful if you don’t know a script well enough to use the native keyboard. The arrangement of characters also makes it much more usable than a regular character map utility. See the list of available pickers.

A couple of posts ago I mentioned that I had updated the Thai picker to version 16. I have now updated a few more. For ease of reference, I will list here the main changes between version 16 pickers and previous versions back to version 12.

  • Fonts rather than graphics. The main selection table in version 12 used images to represent characters. These have now gone, in favour of fonts. Most pickers include a web font download to ensure that you will see the characters. This reduces the size and download time significantly when you open a picker. Other source code changes have reduced the size of the files even further, so that the main file is typically only a small fraction of the size it was in version 14.

    It is also now possible, in version 16, to change the font of the main selection table and the font size.

  • UI. The whole look and feel of the user interface has changed from version 14 onwards, and includes useful links and explanations off the top of the normal work space.

    In particular, the vertical menu, introduced in version 14, has been adjusted so that input features can be turned on and off independently, and new panels appear alongside the others, rather than toggling the view from one mode to another. So, for example, you can have hints and shape-based selectors turned on at the same time. When something is switched on, its label in the menu turns orange, and the full text of the option is followed by a check mark.

  • Transcription panels. Some pickers had one or more transcription views in versions below 16. These enable you to construct some non-Latin text when working from a Latin transcription. In version 16 these alternate views are converted to panels that can be displayed at the same time as other information. They can be shown or hidden from the vertical menu. When there is ambiguity as to which characters to use, a pop up displays alternatives. Click on one to insert it into the output. There is also a panel containing non-ASCII Latin characters, which can be used when typing Latin transcriptions directly into the main output area. This panel is now hidden by default, but can be easily shown from the vertical menu.

  • Automated transcription. Version 16 pickers carry forward, and in some cases add, automated transcription converters. In some cases these are intended to generate only an approximation to the needed transcription, in order to speed up the transcription process. In other cases, they are complete. (See the notes for the picker to tell which is which.) Where there is ambiguity about how to transcribe a sequence of characters, the interface offers you a choice from alternatives. Just click on the character you want and it will replace all the options proposed. In some cases, particularly South-East Asian scripts, the text you want to transcribe has to be split into syllables first, using spaces and or hyphens. Where this is necessary, a condense button it provided, to quickly strip out the separators after the transcription is done.

  • Layout The default layout of the main selection table has usually been improved, to make it easier to locate characters. Rarely used, deprecated, etc, characters appear below the main table, rather than to the right.

  • Hints Very early versions of the pickers used to automatically highlight similar and easily confusable characters when you hovered over a character in the main selection table. This feature is being reintroduced as standard for version 16 pickers. It can be turned on or off from the vertical menu. This is very helpful for people who don’t know the script well.

  • Shape-based selection. In previous versions the shape-based view replaced the default view. In version 16 the shape selectors appear below the main selection table and highlight the characters in that table. This arrangement has several advantages.

  • Applying actions to ranges of text. When clicking on the Codepoints and Escapes buttons, it is possible to apply the action to a highighted range of characters, rather than all the characters in the output area. It is also possible to transcribe only highlighted text, when using one of the automated transcription features.

  • Phoneme bank. When composing text from a Latin transcription in previous versions you had to make choices about phonetics. Those choices were stored on the UI to speed up generation of phonetic transcriptions in addition to the native text, but this feature somewhat complicated the development and use of the transcription feature. It has been dropped in version 16. Hopefully, the transcription panels and automated transcription features will be useful enough in future.

  • Font grid. The font grid view was removed in version 16. It is of little value when the characters are already displayed using fonts.

About pickers: Pickers allow you to quickly create phrases in a script by clicking on Unicode characters arranged in a way that aids their identification. Pickers are likely to be most useful if you don’t know a script well enough to use the native keyboard. The arrangement of characters also makes it much more usable than a regular character map utility. See the list of available pickers.

I have uploaded another new version of the Thai character picker.

Sorry this follows so quickly on the heels of version 15, but as soon as I uploaded v15 several ideas on how to improve it popped into my head. This is the result. I will hopefully bring all the pickers, one by one, up to the new version 16 format. If you prefer, you can still access version 12.

The main changes include:

  • UI. Adjustment of the vertical menu, so that input features can be turned on and off independently, and new panels appear with the others, rather than toggling from one to another. So, for example, you can have hints and shape-based selectors turned on at the same time. When something is switched on, its label in the menu turns orange, and the full text of the option is followed by a check mark.
  • Transcription panels. Panels have been added to enable you to construct some Thai text when working from a Latin transcription. This brings the transcription inputs of version 12 into version 16, but in a more compact and simpler way, and way that gives you continued access to the standard table for special characters.

    There are currently options to transcribe from ISO 11940-2 (although there are some gaps in that), or from the transcription used by Benjawan Poomsan Becker in her book, Thai for Beginners. These are both transcriptions based on phonetic renderings of the Thai, so there is often ambiguity about how to transcribe a particular Latin letter into Thai. When such an ambiguity occurs, the interface offers you a choice via a small pop-up. Just click on the character you want and it will be inserted into the main output area.

    The transcription panels are useful because you can add a whole vowel at a time, rather than picking the individual vowel signs that compose it. An issue arises, however, when the vowel signs that make up a given vowel contain one that appears to the left of the syllable initial consonant(s). This is easily solved by highlighting the syllable in question and clicking on the reorder button. The vowel sign in question will then appear as the first item in the highlighted text.

    There is also a panel containing non-ASCII Latin characters, which can be used when typing Latin transcriptions directly into the main output area. (This was available in v15 too, but has been made into a panel like the others, which can be hidden when not needed.)

  • Tones for automatic IPA transcriptions. The automatic transcription to IPA now adds tone marks. These are usually correct, but, as with other aspects of the transcription, it doesn’t take into account the odd idiosyncrasy in Thai spelling, so you should always check that the output is correct. (Note that there is still an issue for some of the ambiguous transcription cases, mostly involving RA.)

For more information about the picker, see the notes at the bottom of the picker page.

About pickers: Pickers allow you to quickly create phrases in a script by clicking on Unicode characters arranged in a way that aids their identification. Pickers are likely to be most useful if you don’t know a script well enough to use the native keyboard. The arrangement of characters also makes it much more usable than a regular character map utility. See the list of available pickers.

I have uploaded a new version of the Thai character picker.

The new version uses characters instead of images for the selection table, making it faster to load and more flexible, and dispenses with the transcription view. If you prefer, you can still access the previous version.

Other changes include:

  • Significant rearrangement of the default selection table. The new arrangement makes it easy to choose the right characters if you have a Latin transcription to hand, which allows the removal of the previous transcription view, at the same time as speeding up that type of picking.
  • Addition of latin prompts to help locate letters (standard with v15).
  • Automatic transcription from Thai into ISO 11940-1, ISO 11940-2 and IPA. Note that for the last two there are some corner cases where the results are not quite correct, due to the ambiguity of the script, and note also that you need to show syllable boundaries with spaces before transcribing. (There’s a way to remove those spaces quickly afterwards.) See below for more information.
  • Hints! When switched on and you mouse over a character, other similar characters or characters incorporating the shape you moused over, are highlighted. Particularly useful for people who don’t know the script well, and may miss small differences, but also useful sometimes for finding a character if you first see something similar.
  • It also comes with the new v15 features that are standard, such as shape-based picking without losing context, range-selectable codepoint information, a rehabilitated escapes button, the ability to change the font of the table and the line-height of the output, and the ability to turn off autofocus on mobile devices to stop the keyboard jumping up all the time, etc.

For more information about the picker, see the notes at the bottom of the picker page.

About pickers: Pickers allow you to quickly create phrases in a script by clicking on Unicode characters arranged in a way that aids their identification. Pickers are likely to be most useful if you don’t know a script well enough to use the native keyboard. The arrangement of characters also makes it much more usable than a regular character map utility. See the list of available pickers.

More about the transcriptions: There are three buttons that allow you to convert from Thai text to Latin transcriptions. If you highlight part of the text, only that part will be transcribed.

The toISO-1 button produces an ISO 11940-1 transliteration, that latinises the Thai characters without changing their order. The result doesn’t normally tell you how to pronounce the Thai text, but it can be converted back to Thai as each Thai character is represented by a unique sequence in Latin. This transcription should produce fully conformant output. There is no need to identify syllables boundaries first.

The toISO-2 and toIPA buttons produce an output that is intended to approximately reflect actual pronunciation. It will work fine most of the time, but there are occasional ambiguities and idiosynchrasies in Thai which will cause the converter to render certain, less common syllables incorrectly. It also doesn’t automatically add accent marks to the phonetic version (though that may be added later). So the output of these buttons should be treated as something that gets you 90% of the way. NOTE: Before using these two buttons you need to add spaces or hyphens between each syllable of the Thai text. Syllable boundaries are important for correct interpretation of the text, and they are not detected automatically.

The condense button removes the spaces from the highlighted range (or the whole output area, if nothing is highlighted).

Note: For the toISO-2 transcription I use a macron over long vowels. This is non-standard.

I have uploaded a new version of the Tibetan character picker.

The new version dispenses with the images for the selection table. If you don’t have a suitable font to display the new version of the picker, you can still access the previous version, which uses images.

Other changes include:

  • Significant rearrangement of the default table, with many less common symbols moved into a location that you need to click on to reveal. This declutters the selection table.
  • Addition of latin prompts to help locate letters (standard with v15).
  • Hints (When switched on and you mouse over a character, other similar characters or characters incorporating the shape you moused over, are highlighted. Particularly useful for people who don’t know the script well, and may miss small differences, but also useful sometimes for finding a character if you first see something similar.)
  • A new Wylie button that converts Tibetan text into an extended Wylie Latin transcription. There are still some uncommon characters that don’t work, but it should cover most normal needs. I used diacritics over lowercase letters rather than uppercase letters, except for the fixed form characters. I also didn’t provide conversions for many of the symbols – they will appear without change in the transcription. See the notes on the page for more information.
  • The Codepoints button, which produces a list of characters in the output box, now has a new feature. If you have highlighted some text in the output box, you will only see a list of the highlighted characters. If there are no highlights, the contents of the whole output box are listed.
  • Don’t forget, if you are using the picker on an iPad or mobile device, to set Autofocus to Off before tapping on characters. This stops the device keypad popping up every time you select a character. (This is also standard for v15.)

About pickers: Pickers allow you to quickly create phrases in a script by clicking on Unicode characters arranged in a way that aids their identification. Pickers are likely to be most useful if you don’t know a script well enough to use the native keyboard. The arrangement of characters also makes it much more usable than a regular character map utility. See the list of available pickers.

If you use my Unicode character pickers, you may have noticed some changes recently. I’ve moved several pickers on to version 14. Most of the noticeable changes are in the location and styling of elements on the UI – the features remain pretty much unchanged.

Pages have acquired a header at the top (which is typically hidden), that provides links to related pages, and integrates the style into that of the rest of the site. What you don’t see is a large effort to tidy the code base and style sheets.

So far, I have changed the following: Arabic block, Armenian, Balinese, Bengali, Khmer, IPA, Lao, Mongolian, Myanmar, and Tibetan.

I will convert more as and when I get time.

However, in parallel, I have already made a start on version 15, which is a significant rewrite. Gone are the graphics, to be replaced by characters and webfonts. This makes a huge improvement to the loading time of the page. I’m also hoping to introduce more automated transcription methods, and simpler shape matching approaches.

Some of the pickers I already upgraded to version 14 have mechanisms for transcription and shape-based identification that took a huge effort to create, and will take a substantial effort to upgrade to version 15. So they may stay as they are for a while. However, easier to handle and new pickers will move to the new format.

Actually, I already made a start with Gurmukhi v15, which yanks that picker out of the stone-age and into the future. There’s also a new picker for the Uighur language that uses v15 technology. I’ll write separate blogs about those.

 

[By the way, if you are viewing the pickers on a mobile device such as an iPad, don’t forget to turn Autofocus off (click on ‘more controls’ to find the switch). This will stop the onscreen keyboard popping up, annoyingly, each time you try to tap on a character.]

Following up on a very good suggestion by Roger Sperberg, I added two webfonts to the Khmer picker and arranged the font selection list so that you can see which fonts are available on your Mac OS X or Windows7 system.

The webfonts make it possible to use the picker on an iPad or other device that doesn’t have a Khmer font installed. I added two webfonts because one worked on my iPad and the other didn’t, and it was vice versa on my Snow Leopard Macbook.

I also added an HTML5 placeholder for the output box. (I’m wishing you could style that differently from the standard content – and wishing that markup designers would think about this sort of thing and stop using attributes for natural language text…).

You can find the Khmer picker at http://rishida.net/scripts/pickers/khmer/

Picture of the page in action.

>> Use it

This picker contains characters from the Unicode Balinese block needed for writing the Balinese language. Characters needed for Sasak are also available in the Advanced section. Balinese musical notation characters are not included.

About the tool: Pickers allow you to quickly create phrases in a script by clicking on Unicode characters arranged in a way that aids their identification. Pickers are likely to be most useful if you don’t know a script well enough to use the native keyboard. The arrangement of characters also makes it much more usable than a regular character map utility.

About this picker: Characters are grouped to aid input. The consonant block includes characters needed for Kawi and Sanskrit as well as the native Balinese characters, all arranged according to the Brahmi pronunciation grid.

The picker has only a default view and a font grid view. It’s difficult to put in the time for the shape-based, keyboard-based, and various transcription-based views in some other pickers. In a new departure, however, I have included a list of Latin characters on the default view to assist in writing transcriptions alongside Balinese text.

There is, however, a significant issue with this picker, due to the lack of support for Balinese as a script in computers. The only Unicode-based Balinese font I know of is Aksara Bali, but that font seems to only work as expected in Firefox on Mac OS X. Furthermore, the Aksara Bali font doesn’t handle ra repa as described in the Unicode Standard. The sequence <consonant , adeg-adeg, ra repa> produces a visible adeg-adeg, rather than the post-fixed form of ra repa. The sequence <consonant , vowel sign ra repa> produces the post-fixed form of ra repa, rather than the subjoined form. You can produce the post-fixed form with this font by using <consonant , vowel sign ra repa> and the subjoined form by using <consonant , adeg-adeg, ra, pepet>, but these sequences will produce content that cannot be matched against sequences using the Unicode approach, and content that may fail with other Unicode-compliant fonts in the future.

Hopefully some new, fully Unicode-compliant fonts will come along soon. This is one of the most beautiful scripts I have come across.

(Btw, I’m working on a set of notes for Balinese characters, linked from UniView, with some feature innovations to get around the font issue. Look out for that later. And I’m thinking I should develop a Javanese picker to go with this one. Just need a bit of time…)

For the curious, here’s the first article of the Universal Declaration of Human Rights, as typed in the Balinese picker. Translation by Tri Ediwan (reproduced from Omniglot).

Picture of the page in action.

>> Use UniView

About the tool: Look up and see characters (using graphics or fonts) and property information, view whole character blocks or custom ranges, select characters to paste into your document, paste in and discover unknown characters, search for characters, do hex/dec/ncr conversions, highlight character types, etc. etc. Supports Unicode 6.0 and written with Web Standards to work on a variety of browsers. No need to install anything.

Latest changes: The majority of changes in this update relate to the user interface. They include the following:

  • Many controls have been grouped under three tabs: Look up, Filter, and Options. Various previously dispersed controls were gathered together under the Filter and Options tabs. Many of the controls have been slightly renamed.
  • The Search control has been moved to the top right of the window, where it is always visible.
  • The old Text Area is now a Copy & Paste control that has a 2-dimensional input box. In browser such as Safari, Chrome and Firefox 4, this box can be stretched by the user to whatever size is preferred.
  • The icon that provides a toggle switch between revealing detailed information for a character in a list or table, or copying that character to the Copy & Paste box has been redesigned. It stands alone and indicates the location of the current outcome using arrows.
    It looks like this: with the two arrows or this with the two arrows.
  • Title text has been provided for all controls, describing briefly what that control does. You can see this information by hovering over the control with the mouse.

Many of these changes were introduced to make it a little easier for newcomers to get to grips with UniView.

There were also some feature changes:

  • The ‘Codepoints’ control was converted to accept text as well as code points and renamed ‘Characters’. By default the control expect hex code point values, but this can be switched using the radio buttons. For text, you would usually use the ‘Copy & Paste’ control, but if you want to check out some characters without disturbing the contents of that control, you can now do so by setting the ‘Character’ radio button on the ‘Characters’ control.
  • The control to look up characters in the Unihan database the icon that looks like a Japanese character was fixed, but also extended to handle multiple characters at a time, opening a separate window for each character. (UniView warns you if you try to open more than 5 windows.)
  • The control to send characters to the Unicode Conversion tool the icon with overlapping boxes was fixed and now puts the character content of the field in the green box of the Converter Tool. If you need to convert hex or decimal code point values, do that in the converter.
  • The Show Age feature now works with lists, not just tables.

>> Use it

Inspired by some comments on John Well’s blog, I decided to add a keyboard layout to the IPA picker today. It follows the layout of Mark Huckvale’s Unicode Phonetic Keyboard (UCL) v1.01.

I can’t say I understand why many of the characters are allocated to the keys they are, but I figured that if John Wells uses this keyboard it would be probably worth using its layout.

Picture of the page in action.

>> Use it

This picker contains characters from the Unicode Mongolian block needed for writing the Mongolian language. It doesn’t include Sibe, Todo or Manchu characters. Mongolian is a complex script, and I am still familiarising myself with it. This is an initial trial version of a Mongolian picker, and as people use it and raise feedback I may need to make changes.

About the tool: Pickers allow you to quickly create phrases in a script by clicking on Unicode characters arranged in a way that aids their identification. Pickers are likely to be most useful if you don’t know a script well enough to use the native keyboard. The arrangement of characters also makes it much more usable than a regular character map utility.

About this picker: The output area for this picker is set up for vertical text. However, only Internet Explorer currently supports vertical text display, and only IE8 supports Mongolian’s left-to-right column progression. In addition, it seems that IE doesn’t support ltr columns in textarea elements. The bottom line is that, although the output area is the right shape and position for vertical text, mostly the output will be horizontal. You will see vertical text in IE, but the column positions will look wrong. Nevertheless, in any of these cases, when you cut and paste text into another document, the characters will still be correctly ordered.

Consonants are to the left, and in the order listed in the Wikipedia article about Mongolian text. To their right are vowels, then punctuation, spaces and control characters, and number digits. The variation selectors are positioned just below the consonants.

As you mouse over the letters, the various combining forms appear in a column to the far left. This is to help identify characters, for those less familiar with the alphabet.

Picture of the page in action.

>> Use it

In 1992 the Chinese government recognised the Fraser alphabet as the official script for the Lisu language and has encouraged its use since then. There are 630,000 Lisu people in China, mainly in the regions of Nujiang, Diqing, Lijiang, Dehong, Baoshan, Kunming and Chuxiong in the Yunnan Province. Another 350,000 Lisu live in Myanmar, Thailand and India. Other user communities are mostly Christians from the Dulong, the Nu and the Bai nationalities in China.

About the tool: Pickers allow you to quickly create phrases in a script by clicking on Unicode characters arranged in a way that aids their identification. Pickers are likely to be most useful if you don’t know a script well enough to use the native keyboard. The arrangement of characters also makes it much more usable than a regular character map utility.

Latest changes: This picker is new. The default view was modified from an original proposal by Benjamin Lee, and is likely to be more useful to people who are somewhat familiar with the alphabet and characters of Lisu. Characters are arranged to simplify entry, with consonants to the left, vowels to their right, and tone marks to their right.

There is also a keyboard view. Many of the positions of characters are based on keyboard layouts I have seen. Those keyboards, however, tended to use some ASCII characters for punctuation, when the Unicode Standard recommends other characters (in particular, MODIFIER LETTER LOW MACRON and MODIFIER LETTER APOSTROPHE) or omit some punctuation characters mentioned in the Unicode Standard. The current version of this keyboard, therefore adds some extra characters.

The layout is adequate, given that pickers assume availability of a QWERTY keyboard, however if a real standardised keyboard layout is to be made, it should involve some further changes. For example, people wanting to use syntax characters such as comma, period, semi-colon, single quote, etc, while writing the text in Lisu will need direct access to those characters. They are missing from this layout.

About the tool: Pickers allow you to quickly create phrases in a script by clicking on Unicode characters arranged in a way that aids their identification. Pickers are likely to be most useful if you don’t know a script well enough to use the native keyboard. The arrangement of characters also makes it much more useable than a regular character map utility.

Latest changes: This picker has been upgraded to use the version 10 look and feel, and incorporate new characters from Unicode version 5.2. Characters whose use is discouraged in Unicode have been moved to the advanced section – similar looking images in the main section put multiple characters into the output, as per NFC normalization.

>> Use it

Next Page »