Dochula Pass, Bhutan

Picture of the page in action.

>> Use UniView

The major change in this update is the update of the data to support Unicode version 6.1.0, which should be released today. (See the list of links to new Unicode blocks below.)

There are also a number of feature and bug related changes.

What UniView does: Look up and see characters (using graphics or fonts) and property information, view whole character blocks or custom ranges, select characters to paste into your document, paste in and discover unknown characters, search for characters, do hex/dec/ncr conversions, highlight character types, etc. etc. Supports Unicode 6.1 and written with Web Standards to work on a variety of browsers. No need to install anything.

List of changes:

  • One significant change enables you to display information in a separate window, rather than overwriting the information currently displayed. This can be done by typing/pasting/dragging a set of characters or character code values into the new Popout area and selecting the  icon alongside the Characters or Copy & paste input fields (depending on what you put in the popout window).

  • Two new icons were added to the Copy & paste area:

    Analyse Clicking on this will display the characters in the area in the lower right part of the page with all relevant characters converted to uppercase, lowercase and titlecase. Characters that had no case conversion information are also listed.

    Analyse Clicking on this produces the same kind of output as clicking on the icon just above, but shows the mappings for those characters that have been changed, eg. e→E.

  • Where character information displayed in the lower right panel has a case or decomposition mapping, UniView now displays the characters involved, rather than just giving the hex value(s), eg. Uppercase mapping: 0043 C. You will need a font on your system to see the characters displayed in this way, but whether or not you have a font, this provides a quick and easy way to copy the case-changed character (rather than having to copy the hex value and convert it first).

  • There is also a new line, slightly further down, when UniView is in graphic mode. This line starts with ‘As text:’, and shows the character using whatever default font you have on your system. Of course, if you don’t have a font that includes that character you won’t see it. This has been added to make it easier to copy and paste a character into text.

  • There is also a new line, slightly further down, when UniView is in graphic mode. This line starts with ‘As text:’, and shows the character using whatever default font you have on your system. Of course, if you don’t have a font that includes that character you won’t see it. This has been added to make it easier to copy and paste a character into text.

  • Fixed some small bugs, such as problems with search when U+29DC INCOMPLETE INFINITY is returned.

Enjoy.

Here are direct links to the new blocks added to Unicode 6.1:

One of the more useful features of UniView is its ability to list the characters in a string with names and codepoints. This is particularly useful when you can’t tell what a string of characters contains because you don’t have a font, or because the script is too complex, etc.

'ishida' in Persian in  nastaliq font style

For example, I was recently sent an email where my name was written in Persian as ایشی‌دا. The image shows how it looks in a nastaliq font.

To see the component characters, drop the string into UniView’s Copy & Paste field and click on the downwards pointing arrow icon. Here is the result:

list of characters

Note how you can now see that there’s an invisible control character in the string. Note also that you see a graphic image for each character, which is a big help if the string you are investigating is just a sequence of boxes on your system.

Not only can you discover characters in this way, but you can create lists of characters which can be pasted into another document, and customise the format of those lists.

Pasting the list elsewhere

If you select this list and paste it into a document, you’ll see something like this:

  0627  ARABIC LETTER ALEF
  06CC  ARABIC LETTER FARSI YEH
  0634  ARABIC LETTER SHEEN
  06CC  ARABIC LETTER FARSI YEH
  200C  ZERO WIDTH NON-JOINER
  062F  ARABIC LETTER DAL
  0627  ARABIC LETTER ALEF

You can make the characters appear by deselecting Use graphics on the Look up tab. (Of course, you need an arabic font to see the list as intended.)

ا  ‎0627  ARABIC LETTER ALEF
ی  ‎06CC  ARABIC LETTER FARSI YEH
ش  ‎0634  ARABIC LETTER SHEEN
ی  ‎06CC  ARABIC LETTER FARSI YEH
‌  ‎200C  ZERO WIDTH NON-JOINER
د  ‎062F  ARABIC LETTER DAL
ا  ‎0627  ARABIC LETTER ALEF

Customising the list format

What may be less obvious is that you can also customise the format of this list using the settings under the Options tab. For example, using the List format settings, I can produce a list that moves the character column between the number and the name, like this:

  0627  ا  ARABIC LETTER ALEF
  ‎06CC  ی  ARABIC LETTER FARSI YEH
  ‎0634  ش  ARABIC LETTER SHEEN
  ‎06CC  ی  ARABIC LETTER FARSI YEH
  ‎200C  ‌  ZERO WIDTH NON-JOINER
  ‎062F  د  ARABIC LETTER DAL
  ‎0627  ا  ARABIC LETTER ALEF

Or I can remove one or more columns from the list, such as:

  ا  ARABIC LETTER ALEF
  ی  ARABIC LETTER FARSI YEH
  ش  ARABIC LETTER SHEEN
  ی  ARABIC LETTER FARSI YEH
  ‌  ZERO WIDTH NON-JOINER
  د  ARABIC LETTER DAL
  ا  ARABIC LETTER ALEF

With the option Show U+ in lists I can also add or remove the U+ before the codepoint value. For example, this lets me produce the following list:

  ‎U+0627  ARABIC LETTER ALEF
  ‎U+06CC  ARABIC LETTER FARSI YEH
  ‎U+0634  ARABIC LETTER SHEEN
  ‎U+06CC  ARABIC LETTER FARSI YEH
  ‎U+200C  ZERO WIDTH NON-JOINER
  ‎U+062F  ARABIC LETTER DAL
  ‎U+0627  ARABIC LETTER ALEF

Other lists in UniView

We’ve shown how you can make a list of characters in the Cut & Paste box, but don’t forget that you can create lists for a Unicode block, custom range of characters, search list results, or list of codepoint values, etc. And not only that, but you can filter lists in various ways.

Here is just one quick example of how you can obtain a list of numbers for the Devanagari script.

  1. On the Look up tab, select Devanagari from the Unicode block pull down list.
  2. Select Show range as list and deselect (optional) Use graphics.
  3. Under the Filter tab, select Number from the Show properties pull down list.
  4. Click on Make list from highlights

You end up with the following list, that you can paste into your document.

०  ‎0966  DEVANAGARI DIGIT ZERO
१  ‎0967  DEVANAGARI DIGIT ONE
२  ‎0968  DEVANAGARI DIGIT TWO
३  ‎0969  DEVANAGARI DIGIT THREE
४  ‎096A  DEVANAGARI DIGIT FOUR
५  ‎096B  DEVANAGARI DIGIT FIVE
६  ‎096C  DEVANAGARI DIGIT SIX
७  ‎096D  DEVANAGARI DIGIT SEVEN
८  ‎096E  DEVANAGARI DIGIT EIGHT
९  ‎096F  DEVANAGARI DIGIT NINE

(Of course, you can also customise the layout of this list as described in the previous section.)

Try it out.

Reversing the process: from list to string

To complete the circle, you can also cut & paste any of the lists in the blog text above into UniView, to explore each character’s properties or recreate the string.

Select one of the lists above and paste it into the Characters input field on the Look up tab. Hit the downwards pointing arrow icon alongside, and UniView will recreate the list for you. Click on each character to view detailed information about it.

If you want to recreate the string from the list, simply click on the upwards pointing arrow icon below the Copy & paste box, and the list of characters will be reconstituted in the box as a string.

Voila!

Picture of the page in action.

The ‘i18n checker’ is a free service by the W3C that provides information about internationalization-related aspects of your HTML page, and advice on how to improve your use of markup, where needed, to support the multilingual Web.

This latest release uses a new user interface and redesigned source code. It also adds a number of new tests, a file upload facility, and support for HTML5.

This is still a ‘pre-final’ release and development continues. There are already plans to add further tests and features, to translate the user interface, to add support for XHTML5 and polyglot documents, to integrate with the W3C Unicorn checker, and to add various other features. At this stage we are particularly interested in receiving user feedback.

Try the checker and let us know if you find any bugs or have any suggestions.

Picture of the page in action.

>> Use UniView

About the tool: Look up and see characters (using graphics or fonts) and property information, view whole character blocks or custom ranges, select characters to paste into your document, paste in and discover unknown characters, search for characters, do hex/dec/ncr conversions, highlight character types, etc. etc. Supports Unicode 6.0 and written with Web Standards to work on a variety of browsers. No need to install anything.

Latest changes: The majority of changes in this update relate to the user interface. They include the following:

  • Many controls have been grouped under three tabs: Look up, Filter, and Options. Various previously dispersed controls were gathered together under the Filter and Options tabs. Many of the controls have been slightly renamed.
  • The Search control has been moved to the top right of the window, where it is always visible.
  • The old Text Area is now a Copy & Paste control that has a 2-dimensional input box. In browser such as Safari, Chrome and Firefox 4, this box can be stretched by the user to whatever size is preferred.
  • The icon that provides a toggle switch between revealing detailed information for a character in a list or table, or copying that character to the Copy & Paste box has been redesigned. It stands alone and indicates the location of the current outcome using arrows.
    It looks like this: with the two arrows or this with the two arrows.
  • Title text has been provided for all controls, describing briefly what that control does. You can see this information by hovering over the control with the mouse.

Many of these changes were introduced to make it a little easier for newcomers to get to grips with UniView.

There were also some feature changes:

  • The ‘Codepoints’ control was converted to accept text as well as code points and renamed ‘Characters’. By default the control expect hex code point values, but this can be switched using the radio buttons. For text, you would usually use the ‘Copy & Paste’ control, but if you want to check out some characters without disturbing the contents of that control, you can now do so by setting the ‘Character’ radio button on the ‘Characters’ control.
  • The control to look up characters in the Unihan database the icon that looks like a Japanese character was fixed, but also extended to handle multiple characters at a time, opening a separate window for each character. (UniView warns you if you try to open more than 5 windows.)
  • The control to send characters to the Unicode Conversion tool the icon with overlapping boxes was fixed and now puts the character content of the field in the green box of the Converter Tool. If you need to convert hex or decimal code point values, do that in the converter.
  • The Show Age feature now works with lists, not just tables.

It has always been possible to pass a string to the converter in the URI, but that was never documented.

Now it is, and you can pass a string using the q parameter. For example, http://rishida.net/tools/conversion/?q=Crêpes. You can also pass a string with escapes in it, but you will need to be especially careful to percent escape characters such as &, + and # which affect the URI syntax. For example, http://rishida.net/tools/conversion/?q=CrU%2B00EApes.

>> Use it

Picture of the page in action.

About the tool: BCP 47 language tags are built from subtags in the IANA Subtag Registry. This tool helps you find or look up subtags and check for errors in language tags. It also provides information to guide your choices.

Latest changes: I reworked the informational text that accompanies macrolanguages, their encompassed languages, and extlang subtags. As part of that, I changed the code to allow for highlighting of specific cases. For example, where legacy may dictate that the macrolanguage subtag (zh) is more useful for Mandarin Chinese than the more specific tags (cmn or zh-cmn).

I simplified the intro to the page, but added a link to the new article Choosing a Language Tag, which provides useful step-by-step guidelines on creating language tags.

I also changed the user interface somewhat. The input fields are easier to work with and take up less vertical space. Also, you can now submit a query by simply hitting return after typing into a field. I had originally required you to click on a submit button so that all values in other fields would be retained when the answer is shown – this was so that while checking various subtags you could build up a language tag in the Check field for later checking. I just found that the annoyance of continually having to resubmit after forgetting to click on the submit button wasn’t worth the extra functionality (and I was also encouraged to do so by feedback from Bert Bos).

>> Use it

Picture of the page in action.

I have added a bunch of additional new features to my lookup tool to help with choosing language tags. There is additional information available when you look up subtags (such as what to use if the subtag is deprecated, and what subtags macrolanguages enclose, etc.), and more tests on well-formedness with clearer explanations of the problem. Example.

This should make it a lot more useful to people who haven’t read BCP 47 and want to create language tags. Hopefully, in a short while, I’ll also write and link to an article that describes how to use subtags from the ground up in a procedural way, that will complement the tool.

For further assistance, you can now link from a language subtag result to the SIL Ethnologue, to make it easier to check whether that subtag really does refer to the language you were thinking of.

In addition, script subtag results link to Unicode blocks in UniView.

>> Use it

Picture of the page in action.

The IANA Subtag Registry has been recently updated to contain 220 extlang subtags and the ISO 639-3 language subtags, taking the total number of subtags to almost 8,000.

I have produced a new version of my lookup tool to help with language tagging. In addition to helping you find subtags and lookup the meaning of subtags, it now helps check the well-formedness of a language tag.

The tool provides access to all currently defined subtags, including the new extlang subtags.

Parsing language tags. In addition to trying to make the user interface more friendly, I also added the ability to parse hyphenated tags and discover their structure and check for errors. I’m not claiming with this release that the new parser field tests all the corner cases, but it should provide reports for most of the typical errors.

It reports errors for the following:

– subtags that are not in the registry (by type)
– incorrectly ordered subtags
– duplicate variant tags and multiple tags of other types
– overlong private use subtags

Try this example.

It doesn’t yet handle extensions, but then there aren’t any valid ones to handle yet anyway.

I hope that’s useful.

>> Use it !

Picture of the page in action.

This tool allows you to search for subtags that have, say, ‘french’ in their description (there are currently 11), or to find out what that mysterious ‘ch’ subtag stands for (there are 2 possibilities).

Update: You can now also search for a hyphen-separated sequences of subtags, such as sl-IT-nedis and find out what each of the component subtags mean.

Alternatively, you can simply list all current language tags, or script tags, or variants, etc.

For months I’ve been wanting to write a small, Web-based tool for finding things in the subtag registry without having to work on the (for many people, intimidating) raw text file on the IANA site.

Tom Gruner created an initial tool for pretty printing the IANA list, which handled enough of the basics to allow me to use the little time I have these days to add the search functionality on top.

If you have JavaScript running, you are shown just the tags and descriptions initially, but by clicking on those you can reveal all additional information in the registry for a given tag. I also highlight tags that are deprecated, so you can see that straight away.

(PS: Some final tweaks to the code will come when I have a spare moment for things like making the expanding list more accessible, etc.)

« Previous Page