Dochula Pass, Bhutan

Picture of the page in action.

The page Mongolian variant forms has been significantly changed and moved to a new location. The main changes include the following:

  • The ‘NP’ column is now the ‘DS01’ column. It shows the shapes discussed by the mongolian experts on the public-i18n-mongolian@w3.org list with a view to updating and standardising the results of using Unicode variants. The column reflects the state of that document as of 25 Nov 2016. Expected updates to the document are shown using “[current] ≫ [expected]”. As those and other changes are made to DS01 I will update this variant comparison page.
  • I added a column for L2/16-309 (the output of the Hohot discussions) as of 26 Oct 2016. Differences between DS01 and L2/16-309 are all highlighted for quick reference.
  • I also added a column for the Unicode Standard v9 chart data, and again highlighted differences from DS01 or L2/16-309. To see this column, click on the vertical blue bar, bottom right, then set Hidden Columns to Show.
  • Showing the hidden columns also shows a usage column and a notes
    column (information in both cases taken from L2/16-309).
  • All the font information has been rechecked, and some bugs fixed.
  • Other editorial changes include the hiding of former notes, and the
    simplification and update of the page intro.

Picture of the page in action.

Another significant development is the creation of a Shape Index. That document lists all shapes used in the page Mongolian Variants, and enables you to jump to the appropriate table in that page so that you can see how it’s used.

The classification used is not intended to be etymologically or philosophically pure. It is intended as a simple practical tool to help locate shapes, and one that also works for novices.

I hope these changes will help us identify and resolve the remaining differences between the documents. I will update these pages as the source documents change (please advise me when new versions are developed). Of course, comments are welcome.

If you want to join the discussion about Mongolian variants you could join this mailing list. (Hit the subscribe link.)

Picture of the page in action.
>> Use the picker

An update to version 17 of the Mongolian character picker is now available.

When you hover over or select a character in the selection area, the box to the left of that area displays the alternate glyph forms that are appropriate for that character. By default, this only happens when you click on a character, but you can make it happen on hover by clicking on the V in the gray selection bar to the right.

The list includes the default positional forms as well as the forms produced by following the character with a Free Variation Selector (FVS). The latter forms have been updated, based on work which has been taking place in 2015 to standardise the forms produced by using FVS. At the moment, not all fonts will produce the expected shapes for all possible combinations. (For more information, see Notes on Mongolian variant forms.)

An additional new feature is that when the variant list is displayed, you can add an appropriate FVS character to the output area by simply clicking in the list on the shape that you want to see in the output.

This provides an easy way to check what shapes should be produced and what shapes are produced by a given font. (You can specify which font the app should use for display of the output.)

Some small improvements were also made to the user interface. The picker works best in Firefox and Edge desktop browsers, since they now have pretty good support for vertical text. It works least well in Safari (which includes the iPad browsers).

For more information about the picker, see the notes at the bottom of the picker page.

About pickers: Pickers allow you to quickly create phrases in a script by clicking on Unicode characters arranged in a way that aids their identification. Pickers are likely to be most useful if you don’t know a script well enough to use the native keyboard. The arrangement of characters also makes it much more usable than a regular character map utility. See the list of available pickers.

There is some confusion about which shapes should be produced by fonts for Mongolian characters. Most letters have at least one isolated, initial, medial and final shape, but other shapes are produced by contextual factors, such as vowel harmony.

Unicode has a list of standardised variant shapes, dating from 27 November 2013, but that list is not complete and contains what are currently viewed by some as errors. It also doesn’t specify the expected default shapes for initial, medial and final positions.

The original list of standardised variants was based on 蒙古文编码 by Professor Quejingzhabu in 2000.

A new proposal was published on 20 January 2014, which attempts to resolve the current issues, although I think that it introduces one or two issues of its own.

The other factor in this is what the actual fonts do. Sometimes they follow the Unicode standardised variants list, other times they diverge from it. Occasionally a majority of implementations appear to diverge in the same way, suggesting that the standardised list should be adapted to reality.

To help unravel this, I put together a page called Notes on Mongolian variant forms that visually shows the changes between the various proposals, and compares the results produced by various fonts.

This is still an early draft. The information only covers the basic Mongolian range – Todo, Sibe, etc still to come. Also, I would like to add information about other fonts, if I can obtain them.

Update: 16 Apr 2015, The Todo, Sibe, Manchu, Sanskrit and Tibetan characters are now all done, and font information added for them. (And the document was moved to github.)

If you use my Unicode character pickers, you may have noticed some changes recently. I’ve moved several pickers on to version 14. Most of the noticeable changes are in the location and styling of elements on the UI – the features remain pretty much unchanged.

Pages have acquired a header at the top (which is typically hidden), that provides links to related pages, and integrates the style into that of the rest of the site. What you don’t see is a large effort to tidy the code base and style sheets.

So far, I have changed the following: Arabic block, Armenian, Balinese, Bengali, Khmer, IPA, Lao, Mongolian, Myanmar, and Tibetan.

I will convert more as and when I get time.

However, in parallel, I have already made a start on version 15, which is a significant rewrite. Gone are the graphics, to be replaced by characters and webfonts. This makes a huge improvement to the loading time of the page. I’m also hoping to introduce more automated transcription methods, and simpler shape matching approaches.

Some of the pickers I already upgraded to version 14 have mechanisms for transcription and shape-based identification that took a huge effort to create, and will take a substantial effort to upgrade to version 15. So they may stay as they are for a while. However, easier to handle and new pickers will move to the new format.

Actually, I already made a start with Gurmukhi v15, which yanks that picker out of the stone-age and into the future. There’s also a new picker for the Uighur language that uses v15 technology. I’ll write separate blogs about those.

 

[By the way, if you are viewing the pickers on a mobile device such as an iPad, don’t forget to turn Autofocus off (click on ‘more controls’ to find the switch). This will stop the onscreen keyboard popping up, annoyingly, each time you try to tap on a character.]

If you put a span tag around one or two letters in an Arabic word, say to change the colour, it breaks the cursiveness in WebKit and Blink browsers. You can change things like colour in Mozilla and IE, but changing the font breaks the connection.

Breaking on colour change makes it hard to represent educational texts and things such as the Omantel logo, which I saw all over Muscat recently. (Omantel is the largest internet provider in Oman.) Note how, despite the colour change, the Arabic letters in the logo below (on the left) still join.

Picture of the Omantel logo.
Multi-coloured Omantel Arabic logo on a building in Muscat.

Here’s an example of an educational page that colours parts of words. You currently have to use Firefox or IE to get the desired effect.

This lead to questions about what to do if you convert block elements, such as li, into inline elements that sit side by side. You probably don’t want the character at the end of one li tag to join with the next one. What if there is padding or margins between them, should this cause bidi isolation as well as preventing joining behaviour?

See a related thread on the W3C Internationalization and CSS lists.

Picture of the page in action.

>> Use it

This picker contains characters from the Unicode Mongolian block needed for writing the Mongolian language. It doesn’t include Sibe, Todo or Manchu characters. Mongolian is a complex script, and I am still familiarising myself with it. This is an initial trial version of a Mongolian picker, and as people use it and raise feedback I may need to make changes.

About the tool: Pickers allow you to quickly create phrases in a script by clicking on Unicode characters arranged in a way that aids their identification. Pickers are likely to be most useful if you don’t know a script well enough to use the native keyboard. The arrangement of characters also makes it much more usable than a regular character map utility.

About this picker: The output area for this picker is set up for vertical text. However, only Internet Explorer currently supports vertical text display, and only IE8 supports Mongolian’s left-to-right column progression. In addition, it seems that IE doesn’t support ltr columns in textarea elements. The bottom line is that, although the output area is the right shape and position for vertical text, mostly the output will be horizontal. You will see vertical text in IE, but the column positions will look wrong. Nevertheless, in any of these cases, when you cut and paste text into another document, the characters will still be correctly ordered.

Consonants are to the left, and in the order listed in the Wikipedia article about Mongolian text. To their right are vowels, then punctuation, spaces and control characters, and number digits. The variation selectors are positioned just below the consonants.

As you mouse over the letters, the various combining forms appear in a column to the far left. This is to help identify characters, for those less familiar with the alphabet.