Dochula Pass, Bhutan

>> Use it

Inspired by some comments on John Well’s blog, I decided to add a keyboard layout to the IPA picker today. It follows the layout of Mark Huckvale’s Unicode Phonetic Keyboard (UCL) v1.01.

I can’t say I understand why many of the characters are allocated to the keys they are, but I figured that if John Wells uses this keyboard it would be probably worth using its layout.

Picture of the page in action.

>> Use it

This picker contains characters from the Unicode Mongolian block needed for writing the Mongolian language. It doesn’t include Sibe, Todo or Manchu characters. Mongolian is a complex script, and I am still familiarising myself with it. This is an initial trial version of a Mongolian picker, and as people use it and raise feedback I may need to make changes.

About the tool: Pickers allow you to quickly create phrases in a script by clicking on Unicode characters arranged in a way that aids their identification. Pickers are likely to be most useful if you don’t know a script well enough to use the native keyboard. The arrangement of characters also makes it much more usable than a regular character map utility.

About this picker: The output area for this picker is set up for vertical text. However, only Internet Explorer currently supports vertical text display, and only IE8 supports Mongolian’s left-to-right column progression. In addition, it seems that IE doesn’t support ltr columns in textarea elements. The bottom line is that, although the output area is the right shape and position for vertical text, mostly the output will be horizontal. You will see vertical text in IE, but the column positions will look wrong. Nevertheless, in any of these cases, when you cut and paste text into another document, the characters will still be correctly ordered.

Consonants are to the left, and in the order listed in the Wikipedia article about Mongolian text. To their right are vowels, then punctuation, spaces and control characters, and number digits. The variation selectors are positioned just below the consonants.

As you mouse over the letters, the various combining forms appear in a column to the far left. This is to help identify characters, for those less familiar with the alphabet.

Analyser: http://rishida.net/tools/analysestring/

Converter: http://rishida.net/tools/conversion/

The string analyser tool provides information about the characters in a string. One difference in this version is a new section “Data input as graphics”, where you see a horizontal sequence of graphics for each of the characters in the string you are analysing. This can be useful to get a screen snap of the characters. Of course, there is no combining or ligaturing behaviour involved – just a graphic per character.

You can reverse the character order for right-to-left scripts.

Another difference is that you can explode example text in the notes. Take this example: if you click on the Arabic word for Koran (red word near the bottom of the notes), you’ll see a pop-up window in the bottom right corner of the window that lists the characters in that word.

The other change is that the former “Related links” section in the sidebar is now called “Do more”, and the links carry the string you are analysing to the Converter or UniView apps.

Oh, and the page now remembers the options you set between refreshes, which makes life much easier.

The converter tool converts between characters and various escaped character formats. It was changed so that the “View names” button sends the characters to the string analyser tool. This means that you’ll now see graphics for the characters, and that, once on the string analyser page, you can change the amount of information displayed for each character (including showing font-based characters, if you need to).

I also fixed a bug related to the UTF-8 and UTF-16 input. Including spaces after the code values no longer fires off a bug.

PS: The string analyser tool has graphics for all new Unicode 6.0 characters, however I haven’t updated the data for those characters yet. I was planning to do so with the next release of UniView, which should be in October, when the final Unicode database is available.

Picture of the page in action.

>> Use it

In 1992 the Chinese government recognised the Fraser alphabet as the official script for the Lisu language and has encouraged its use since then. There are 630,000 Lisu people in China, mainly in the regions of Nujiang, Diqing, Lijiang, Dehong, Baoshan, Kunming and Chuxiong in the Yunnan Province. Another 350,000 Lisu live in Myanmar, Thailand and India. Other user communities are mostly Christians from the Dulong, the Nu and the Bai nationalities in China.

About the tool: Pickers allow you to quickly create phrases in a script by clicking on Unicode characters arranged in a way that aids their identification. Pickers are likely to be most useful if you don’t know a script well enough to use the native keyboard. The arrangement of characters also makes it much more usable than a regular character map utility.

Latest changes: This picker is new. The default view was modified from an original proposal by Benjamin Lee, and is likely to be more useful to people who are somewhat familiar with the alphabet and characters of Lisu. Characters are arranged to simplify entry, with consonants to the left, vowels to their right, and tone marks to their right.

There is also a keyboard view. Many of the positions of characters are based on keyboard layouts I have seen. Those keyboards, however, tended to use some ASCII characters for punctuation, when the Unicode Standard recommends other characters (in particular, MODIFIER LETTER LOW MACRON and MODIFIER LETTER APOSTROPHE) or omit some punctuation characters mentioned in the Unicode Standard. The current version of this keyboard, therefore adds some extra characters.

The layout is adequate, given that pickers assume availability of a QWERTY keyboard, however if a real standardised keyboard layout is to be made, it should involve some further changes. For example, people wanting to use syntax characters such as comma, period, semi-colon, single quote, etc, while writing the text in Lisu will need direct access to those characters. They are missing from this layout.

Picture of the page in action.

>> Use UniView lite

>> Use UniView

About the tool: Look up and see characters (using graphics or fonts) and property information, view whole character blocks or custom ranges, select characters to paste into your document, paste in and discover unknown characters, search for characters, do hex/dec/ncr conversions, highlight character types, etc. etc. Supports Unicode 5.2 and written with Web Standards to work on a variety of browsers. No need to install anything.

Latest changes: The major change in this update is the addition of an alternative UniView lite interface for the tool that makes it easier to use UniView in restricted screen sizes, such as on mobile devices. The lite interface offers a subset of the functionality provided in the full version, rearranges the user interface and sets up some different defaults (eg. list view is the default, rather than the matrix view). However, the underlying code is the same – only the initial markup and the CSS are different.

Another significant change is that when you click on a character in a list or matrix that character is either added to the text area or detailed information for that character is displayed, but not now both at the same time. You switch between the two possibilities by clicking on the icon. When the background is white (default) details are shown for the character. When the background is orange the character will be added to the text area (like a character map or picker).

Information from my character database is now shown by default when you are shown detailed information for a character. The switch to disable this has been moved to the Options panel.

Text highlighted in red in information from the character database contains examples. In case you don’t have a font for viewing such examples, or in case you just want to better understand the component characters, you can now click on these and the component characters will be listed in a new window (using the String Analyzer tool).

Access to Settings panel has been moved slightly downwards and renamed Options in the full version.

The default order for items in lists is now <character><codepoint><name>, rather than the previous <codepoint><character><name>. This can still be changed in the Options panel, or by setting query parameters.

I changed the Next and Previous functions in the character detail pane so that it moves one codepoint at a time through the Unicode encoding space. The controls are now buttons rather than images.

About the tool: The vocab tester allows you to review vocabulary or phrases. The vocab files can be located anywhere on the Web, and you can create them yourself, using the trivially easy text format described below. You can also print out a list of vocabulary for review, for when you don’t have access to the computer.

Latest changes: You can now specify style information for the text using CSS. I also fixed a bug that was preventing the tool from working on Internet Explorer.

I also fixed one or two smaller bugs and improved the code slightly, the example links now all work, and I added a Russian example.

Read the instructions

>> Use it

About the tool: Pickers allow you to quickly create phrases in a script by clicking on Unicode characters arranged in a way that aids their identification. Pickers are likely to be most useful if you don’t know a script well enough to use the native keyboard. The arrangement of characters also makes it much more useable than a regular character map utility.

Latest changes: This picker has been upgraded to use the version 10 look and feel, and incorporate new characters from Unicode version 5.2. Characters whose use is discouraged in Unicode have been moved to the advanced section – similar looking images in the main section put multiple characters into the output, as per NFC normalization.

>> Use it

About the tool: Pickers allow you to quickly create phrases in a script by clicking on Unicode characters arranged in a way that aids their identification. Pickers are likely to be most useful if you don’t know a script well enough to use the native keyboard. The arrangement of characters also makes it much more useable than a regular character map utility.

Latest changes: Both pickers have been upgraded to use the version 10 look and feel.

The Arabic block picker now includes the latest characters added to the Arabic and Arabic Supplement blocks in Unicode 5.1. Characters are displayed using the shape view of version 10 pickers. This saves a lot of space on-screen.

The Ethiopic picker was also updated to include more recent characters from the Unicode Ethiopic block (added in version 4.1), and the layout was improved to make it easier to locate a character. It still covers only the basic Ethiopic block.

>> Use the Arabic Block picker

>> Use the Ethiopic picker

The new characters.

About the tool: Pickers allow you to quickly create phrases in a script by clicking on Unicode characters arranged in a way that aids their identification. Pickers are likely to be most useful if you don’t know a script well enough to use the native keyboard. The arrangement of characters also makes it much more useable than a regular character map utility

Latest changes: I recently added U+2C71 LATIN SMALL LETTER V WITH RIGHT HOOK (labiodental tap or flap) to the IPA picker. This was in the IPA chart for a long time, but was only added to Unicode in version 5.1.

Today I also added, at the request of Dan McCloy, four prosodic markers: prosodic phrase, prosodic word, syllable and mora (see the second line of the picture).

Regular users will also notice that I recently upgraded the picker chrome to version 10, too.

>> Use it


Picture of the page in action.

About the tool: Look up and see characters (using graphics or fonts) and property information, view whole character blocks or custom ranges, select characters to paste into your document, paste in and discover unknown characters, search for characters, do hex/dec/ncr conversions, highlight character types, etc. etc. Supports Unicode 5.2 and written with Web Standards to work on a variety of browsers. No need to install anything.

Latest changes: The major change in this update is the addition of a function, Show age, to show the version of Unicode where a character was added (after version 1.1). The same information is also listed in the details given for a character in the lower right panel.

The trigger for context-sensitive help was reduced to the first character of a command name, rather than the whole command name. This improves behaviour for commands under More actions by allowing you to click on the command name rather than just the icon alongside to activate the command.

Some ‘quick start’ instructions were also added to the initial display to orient people new to the tool, and this help text was updated in various areas.

The highlighting mechanism was changed. Rather than highlight characters using a coloured border (which is typically not very visible), highlighting now works by greying out characters that are not highlighted. This also makes it clearer when nothing is highlighted.

In the recent past, when you converted a matrix to a list in the lower left panel, greyed-out rows would be added for non-characters. These are no longer displayed. Consequently, the command to remove such rows from the list (previously under More actions) has been removed.

A lot of invisible work went into replacing style attributes in the code with class names. This produces better source code, but doesn’t affect the user experience.

>> Use it


Picture of the page in action.
 
Picture of the page in action.

About the tools: Pickers allow you to quickly create phrases in a script by clicking on Unicode characters arranged in a way that aids their identification. Pickers are likely to be most useful if you don’t know a script well enough to use the native keyboard. The arrangement of characters also makes it much more usable than a regular character map utility

Latest changes: The Urdu and Tamil pickers have been upgraded to version 10. This provides new views of the data, but also involved a thorough overhaul and redesign of the pickers. Transliteration functions have also been added for the Tamil picker.

In addition, the Urdu notes page was updated and a new Tamil notes page was created. Database entries were also updated or, in the case of Tamil, created to support the notes pages. These notes pages are the first to use a new look and feel, based on the analyse-string tool I produced earlier this year. This adds information about each character from the Unicode descriptions data to that from my own database.

Picture of the page in action.

About the tool: Pickers allow you to quickly create phrases in a script by clicking on Unicode characters arranged in a way that aids their identification. Pickers are likely to be most useful if you don’t know a script well enough to use the native keyboard. The arrangement of characters also makes it much more useable than a regular character map utility

Latest changes: Over the Christmas break I’ve applied version 10 upgrades to the following pickers: Bengali, Hebrew, Khmer, Lao, Malayalam, Myanmar, Thai and Tifinagh. In the case of Hebrew and Tifinagh, this came down to completely rewriting the pickers.

Key changes in version 10 include the following:

  • The visible layout of the shape view has been reduced in the vertical direction by showing a group of characters only when you mouse over the orange keys at the top. This makes it easier and faster to locate characters, and also improves use on screens with restricted space. The way similar characters in other groups is handled has been reinvented to fit the new approach better, and enable faster creation of pickers in the future.
  • The visible layout of the transcription view has been adapted in a similar way to the shape view.
  • The button to dump the phonetic buffer has been moved to just below the output area.
  • The Detail button is now called the Analyse button, and both this and the Codepoints commands now bring up the new String Analyser utility, which provides much better results than the old pages.
  • A keyboard view has been added to the Tifinagh picker. This new view may pop up in other pickers in the future.

There were a number of other changes to the code, and not least to the instructions for use on the main picker page and each set of notes below the pickers themselves.

>> Use it


Picture of the page in action.

About the tool: This tool shows you what characters are in a string of Unicode characters, and gives you informaiton about each one. Either type/paste the string into the box on the right of the page, or send it in the URL. It’s especially useful if you have no font for the text, or you are trying to unravel a sequence of characters in a complex script, but also allows you to just dig out information about one or more characters.

Here’s an example

By default you see a large graphic image of each character, the Unicode code point number and name, the Unicode script block in which it occurs, any annotations in the Unicode Standard, and any notes for that character in my character database (which I also updated today with information about Hebrew, Malayalam, Lisu and other scripts).

However, the result can be tailored in terms of the level of information and various aspects of the presentation. Simply click on the options to the right of the page, or (again) include the relevant info in the URI.

For example, you can remove any of these items of information individually (except the codepoint and name), or add a text version of the character. You can also choose a smaller graphic.

In addition, notes from my character database contain examples (coloured red). By clicking on these examples you can list the characters in the example text without leaving the page. The list of characters shows up in the right margin.

Oh, and you can click on links to see a character in UniView (to explore its Unicode properties) or to show the whole block in which the character lives.

You’ll shortly see my other applications such as pickers, UniView, etc, linking to this app.

Hope it’s useful.

>> Use it


Picture of the page in action.

About the tool: Pickers allow you to quickly create phrases in a script by clicking on Unicode characters arranged in a way that aids their identification. Pickers are likely to be most useful if you don’t know a script well enough to use the native keyboard. The arrangement of characters also makes it much more useable than a regular character map utility

Latest changes: This is the first version 9 picker. Changes introduced in version 9 include moving the buttons that allow you to display different views to just below the page title. Also, in version 8 pickers, there was an icon in the phonic view that allowed you to dump to the output the phonetic transcription that builds up while selecting characters. This has been replaced with a button just below the output field. There were a number of other superficial changes.

A significant addition to the Malayalam picker is the ability to convert Malayalam text into a Latin transliteration, based on ISO 15919. There was already a way to convert Latin transliterations to Malayalam script.

This version also continues to allow you to type in chillu characters as either single characters as included in Unicode v5.1, or as a sequence of consonant+virama+zwj. Additions to the Malayalam repertoire added in v5.2 have not yet been added to the picker.

>> Use it


>> Use it

Picture of the page in action.

About the tool: BCP 47 language tags are built from subtags in the IANA Subtag Registry. This tool helps you find or look up subtags and check for errors in language tags. It also provides information to guide your choices.

Latest changes: I reworked the informational text that accompanies macrolanguages, their encompassed languages, and extlang subtags. As part of that, I changed the code to allow for highlighting of specific cases. For example, where legacy may dictate that the macrolanguage subtag (zh) is more useful for Mandarin Chinese than the more specific tags (cmn or zh-cmn).

I simplified the intro to the page, but added a link to the new article Choosing a Language Tag, which provides useful step-by-step guidelines on creating language tags.

I also changed the user interface somewhat. The input fields are easier to work with and take up less vertical space. Also, you can now submit a query by simply hitting return after typing into a field. I had originally required you to click on a submit button so that all values in other fields would be retained when the answer is shown – this was so that while checking various subtags you could build up a language tag in the Check field for later checking. I just found that the annoyance of continually having to resubmit after forgetting to click on the submit button wasn’t worth the extra functionality (and I was also encouraged to do so by feedback from Bert Bos).

>> See what it can do

>> Use it

Picture of a part of the page.

It took me a while to find the time, but I have finally upgraded UniView to suport the final 5.2 release of Unicode, plus a few extra features.

The order of blocks listed in the top left pulldown menu was changed to ressemble the order in the Unicode Charts page. Several sub-block selections were also added to the list (as in the Unicode page), and are displayed in italics.

When you display details of a character in the right panel, the heading Script group has now been used to indicate the sub-block-level headings in the block listings of the Unicode Standard. The link to the Unicode block now follows the heading Unicode block. These sub-block-level headings are also shown when you display a range as a list (as opposed to a matrix).

When you mouse over characters displayed in a matrix, the codepoint and name information for that character now appear just above the matrix. This makes it much easier to locate characters you are looking for.

Finally, but by no means least, small and large graphics are now available for all 1071 Egyptian Hieroglyph characters. This was the last block for which graphics were completely unavailable.

Removed the ‘beta’ from the version number and replaced with .0.1. New version converts u+… (ie. lowercase u) as well as U+… now.

See http://rishida.net/tools/conversion/

Thanks to Martin Dürst for the suggestion.

>> Use it

Picture of the page in action.

I have added a bunch of additional new features to my lookup tool to help with choosing language tags. There is additional information available when you look up subtags (such as what to use if the subtag is deprecated, and what subtags macrolanguages enclose, etc.), and more tests on well-formedness with clearer explanations of the problem. Example.

This should make it a lot more useful to people who haven’t read BCP 47 and want to create language tags. Hopefully, in a short while, I’ll also write and link to an article that describes how to use subtags from the ground up in a procedural way, that will complement the tool.

For further assistance, you can now link from a language subtag result to the SIL Ethnologue, to make it easier to check whether that subtag really does refer to the language you were thinking of.

In addition, script subtag results link to Unicode blocks in UniView.

>> Use it

Picture of the page in action.

The IANA Subtag Registry has been recently updated to contain 220 extlang subtags and the ISO 639-3 language subtags, taking the total number of subtags to almost 8,000.

I have produced a new version of my lookup tool to help with language tagging. In addition to helping you find subtags and lookup the meaning of subtags, it now helps check the well-formedness of a language tag.

The tool provides access to all currently defined subtags, including the new extlang subtags.

Parsing language tags. In addition to trying to make the user interface more friendly, I also added the ability to parse hyphenated tags and discover their structure and check for errors. I’m not claiming with this release that the new parser field tests all the corner cases, but it should provide reports for most of the typical errors.

It reports errors for the following:

– subtags that are not in the registry (by type)
– incorrectly ordered subtags
– duplicate variant tags and multiple tags of other types
– overlong private use subtags

Try this example.

It doesn’t yet handle extensions, but then there aren’t any valid ones to handle yet anyway.

I hope that’s useful.

>> See what it can do

>> Use it

Picture of the page in action.

Following hot on the heels of the last release come some further significant changes to UniView aimed at making it easier to use as Unicode grows.

The big change is that UniView now starts up in graphics mode by default. This means that pages load more slowly, but (especially with the continuing growth of Unicode) also means that you are more likely to be able to see the characters you are looking for. It’s easy to switch between modes at any point, using the “Use graphics” checkbox. (And if you preferred font glyphs as a default, you just need to change the URI in your bookmarked link slightly, and you can continue to work that way.)

To facilitate this change, I created my own graphics for a number of blocks which are not yet covered by decodeunicode, or which are no longer fully covered by decodeunicode. The blocks for which I provided graphics are Latin Extended-C, Latin Extended-D, Latin Extended Additional, Cyrillic Supplement, Cyrillic Extended-B, Modifier Tone Letters, Tibetan, Malayalam, Saurashtra, Ol Chiki, Myanmar, Kayah Li, Cham, Rejang, Vai, Supplemental Punctuation, and Miscellaneous Symbols and Arrows.

There are still many characters for which there are no graphics (especially the new characters in Unicode 5.2), but coverage is much better than it was. As I find more fonts, I will be able to create graphics for the remaining characters.

I also put a grey box around the characters in tables. This is particularly useful if there are no graphics or font glyphs for a block or range of characters, as it makes it easier to locate the character you are looking for.

I also fixed a bug that was preventing Chrome and Safari and IE from displaying the first two Latin blocks. I think the bug was actually in the Unicode data file.

« Previous PageNext Page »