Dochula Pass, Bhutan

This update to the Language Subtag Lookup tool brings back the Check function that had been out of action since last January. The code had to be pretty much completely rewritten to migrate it from the original PHP. In the process, I added support for extension and private use tags, and added several more checks. I also made various changes to the way the results are displayed.

Give it a try with this rather complicated, but valid language tag: zh-cmn-latn-CN-pinyin-fonipa-u-co-phonebk-x-mytag-yourtag

Or try this rather badly conceived language tag, to see some error messages: mena-fr-latn-fonipa-biske-x-mylongtag-x-shorter

The IANA database information is up-to-date. The tool currently supports the IANA Subtag registry of 2014-12-17. It reports subtags for 8,081 languages, 228 extlangs, 174 scripts, 301 regions, 68 variants, and 26 grandfathered subtags.

I have uploaded another new version of the Thai character picker.

Sorry this follows so quickly on the heels of version 15, but as soon as I uploaded v15 several ideas on how to improve it popped into my head. This is the result. I will hopefully bring all the pickers, one by one, up to the new version 16 format. If you prefer, you can still access version 12.

The main changes include:

  • UI. Adjustment of the vertical menu, so that input features can be turned on and off independently, and new panels appear with the others, rather than toggling from one to another. So, for example, you can have hints and shape-based selectors turned on at the same time. When something is switched on, its label in the menu turns orange, and the full text of the option is followed by a check mark.
  • Transcription panels. Panels have been added to enable you to construct some Thai text when working from a Latin transcription. This brings the transcription inputs of version 12 into version 16, but in a more compact and simpler way, and way that gives you continued access to the standard table for special characters.

    There are currently options to transcribe from ISO 11940-2 (although there are some gaps in that), or from the transcription used by Benjawan Poomsan Becker in her book, Thai for Beginners. These are both transcriptions based on phonetic renderings of the Thai, so there is often ambiguity about how to transcribe a particular Latin letter into Thai. When such an ambiguity occurs, the interface offers you a choice via a small pop-up. Just click on the character you want and it will be inserted into the main output area.

    The transcription panels are useful because you can add a whole vowel at a time, rather than picking the individual vowel signs that compose it. An issue arises, however, when the vowel signs that make up a given vowel contain one that appears to the left of the syllable initial consonant(s). This is easily solved by highlighting the syllable in question and clicking on the reorder button. The vowel sign in question will then appear as the first item in the highlighted text.

    There is also a panel containing non-ASCII Latin characters, which can be used when typing Latin transcriptions directly into the main output area. (This was available in v15 too, but has been made into a panel like the others, which can be hidden when not needed.)

  • Tones for automatic IPA transcriptions. The automatic transcription to IPA now adds tone marks. These are usually correct, but, as with other aspects of the transcription, it doesn’t take into account the odd idiosyncrasy in Thai spelling, so you should always check that the output is correct. (Note that there is still an issue for some of the ambiguous transcription cases, mostly involving RA.)

For more information about the picker, see the notes at the bottom of the picker page.

About pickers: Pickers allow you to quickly create phrases in a script by clicking on Unicode characters arranged in a way that aids their identification. Pickers are likely to be most useful if you don’t know a script well enough to use the native keyboard. The arrangement of characters also makes it much more usable than a regular character map utility. See the list of available pickers.

I have uploaded a new version of the Thai character picker.

The new version uses characters instead of images for the selection table, making it faster to load and more flexible, and dispenses with the transcription view. If you prefer, you can still access the previous version.

Other changes include:

  • Significant rearrangement of the default selection table. The new arrangement makes it easy to choose the right characters if you have a Latin transcription to hand, which allows the removal of the previous transcription view, at the same time as speeding up that type of picking.
  • Addition of latin prompts to help locate letters (standard with v15).
  • Automatic transcription from Thai into ISO 11940-1, ISO 11940-2 and IPA. Note that for the last two there are some corner cases where the results are not quite correct, due to the ambiguity of the script, and note also that you need to show syllable boundaries with spaces before transcribing. (There’s a way to remove those spaces quickly afterwards.) See below for more information.
  • Hints! When switched on and you mouse over a character, other similar characters or characters incorporating the shape you moused over, are highlighted. Particularly useful for people who don’t know the script well, and may miss small differences, but also useful sometimes for finding a character if you first see something similar.
  • It also comes with the new v15 features that are standard, such as shape-based picking without losing context, range-selectable codepoint information, a rehabilitated escapes button, the ability to change the font of the table and the line-height of the output, and the ability to turn off autofocus on mobile devices to stop the keyboard jumping up all the time, etc.

For more information about the picker, see the notes at the bottom of the picker page.

About pickers: Pickers allow you to quickly create phrases in a script by clicking on Unicode characters arranged in a way that aids their identification. Pickers are likely to be most useful if you don’t know a script well enough to use the native keyboard. The arrangement of characters also makes it much more usable than a regular character map utility. See the list of available pickers.

More about the transcriptions: There are three buttons that allow you to convert from Thai text to Latin transcriptions. If you highlight part of the text, only that part will be transcribed.

The toISO-1 button produces an ISO 11940-1 transliteration, that latinises the Thai characters without changing their order. The result doesn’t normally tell you how to pronounce the Thai text, but it can be converted back to Thai as each Thai character is represented by a unique sequence in Latin. This transcription should produce fully conformant output. There is no need to identify syllables boundaries first.

The toISO-2 and toIPA buttons produce an output that is intended to approximately reflect actual pronunciation. It will work fine most of the time, but there are occasional ambiguities and idiosynchrasies in Thai which will cause the converter to render certain, less common syllables incorrectly. It also doesn’t automatically add accent marks to the phonetic version (though that may be added later). So the output of these buttons should be treated as something that gets you 90% of the way. NOTE: Before using these two buttons you need to add spaces or hyphens between each syllable of the Thai text. Syllable boundaries are important for correct interpretation of the text, and they are not detected automatically.

The condense button removes the spaces from the highlighted range (or the whole output area, if nothing is highlighted).

Note: For the toISO-2 transcription I use a macron over long vowels. This is non-standard.

I have uploaded a new version of the Tibetan character picker.

The new version dispenses with the images for the selection table. If you don’t have a suitable font to display the new version of the picker, you can still access the previous version, which uses images.

Other changes include:

  • Significant rearrangement of the default table, with many less common symbols moved into a location that you need to click on to reveal. This declutters the selection table.
  • Addition of latin prompts to help locate letters (standard with v15).
  • Hints (When switched on and you mouse over a character, other similar characters or characters incorporating the shape you moused over, are highlighted. Particularly useful for people who don’t know the script well, and may miss small differences, but also useful sometimes for finding a character if you first see something similar.)
  • A new Wylie button that converts Tibetan text into an extended Wylie Latin transcription. There are still some uncommon characters that don’t work, but it should cover most normal needs. I used diacritics over lowercase letters rather than uppercase letters, except for the fixed form characters. I also didn’t provide conversions for many of the symbols – they will appear without change in the transcription. See the notes on the page for more information.
  • The Codepoints button, which produces a list of characters in the output box, now has a new feature. If you have highlighted some text in the output box, you will only see a list of the highlighted characters. If there are no highlights, the contents of the whole output box are listed.
  • Don’t forget, if you are using the picker on an iPad or mobile device, to set Autofocus to Off before tapping on characters. This stops the device keypad popping up every time you select a character. (This is also standard for v15.)

About pickers: Pickers allow you to quickly create phrases in a script by clicking on Unicode characters arranged in a way that aids their identification. Pickers are likely to be most useful if you don’t know a script well enough to use the native keyboard. The arrangement of characters also makes it much more usable than a regular character map utility. See the list of available pickers.

There is some confusion about which shapes should be produced by fonts for Mongolian characters. Most letters have at least one isolated, initial, medial and final shape, but other shapes are produced by contextual factors, such as vowel harmony.

Unicode has a list of standardised variant shapes, dating from 27 November 2013, but that list is not complete and contains what are currently viewed by some as errors. It also doesn’t specify the expected default shapes for initial, medial and final positions.

The original list of standardised variants was based on 蒙古文编码 by Professor Quejingzhabu in 2000.

A new proposal was published on 20 January 2014, which attempts to resolve the current issues, although I think that it introduces one or two issues of its own.

The other factor in this is what the actual fonts do. Sometimes they follow the Unicode standardised variants list, other times they diverge from it. Occasionally a majority of implementations appear to diverge in the same way, suggesting that the standardised list should be adapted to reality.

To help unravel this, I put together a page called Notes on Mongolian variant forms that visually shows the changes between the various proposals, and compares the results produced by various fonts.

This is still an early draft. The information only covers the basic Mongolian range – Todo, Sibe, etc still to come. Also, I would like to add information about other fonts, if I can obtain them.

Update: 16 Apr 2015, The Todo, Sibe, Manchu, Sanskrit and Tibetan characters are now all done, and font information added for them. (And the document was moved to github.)