Dochula Pass, Bhutan

Picture of the page in action.

An updated version of the Unicode Character Converter web app is now available. This app allows you to convert characters between various different formats and notations.

Significant changes include the following:

  • It’s now possible to generate EcmaScript6 style escapes for supplementary characters in the JavaScript output field, eg. \u{10398} rather than \uD800\uDF98.
  • In many cases, clicking on a checkbox option now applies the change straight away if there is content in the associated output field. (There are 4 output fields where this doesn’t happen because we aren’t dealing with escapes and there are problems with spaces and delimiters.)
  • By default, the JavaScript output no longer escapes the ASCII characters that can be represented by \n, \r, \t, \’ and \”. A new checkbox is provided to force those transformations if needed. This should make the JS transform much more useful for general conversions.
  • The code to transform to HTML/XML can now replace RLI, LRI, FSI and PDI if the Convert bidi controls to HTML markup option is set.
  • The code to transform to HTML/XML can convert many more invisible or ambiguous characters to escapes if the Escape invisible characters option is set.
  • UTF-16 code points are all at least 4 digits long.
  • Fixed a bug related to U+00A0 when converting to HTML/XML.
  • The order of the output fields was changed, and various small improvements were made to the user interface.
  • Revamped and updated the notes

Many thanks to the people who wrote in with suggestions.

Picture of the page in action.

UniView now supports Unicode version 9, which is being released today, including all changes made during the beta period. (As before, images are not available for the Tangut additions, but the character information is available.)

This version of UniView also introduces a new filter feature. Below each block or range of characters is a set of links that allows you to quickly highlight characters with the property letter, mark, number, punctuation, or symbol. For more fine-grained property distinctions, see the Filter panel.

In addition, for some blocks there are other links available that reflect tags assigned to characters. This tagging is far from exhaustive! For instance, clicking on sanskrit will not show all characters used in Sanskrit.

The tags are just intended to be an aid to help you find certain characters quickly by exposing words that appear in the character descriptions or block subsection titles. For example, if you want to find the Bengali currency symbol while viewing the Bengali block, click on currency and all other characters but those related to currency will be dimmed.

(Since the highlight function is used for this, don’t forget that, if you happen to highlight a useful subset of characters and want to work with just those, you can use the Make list from highlights command, or click on the upwards pointing arrow icon below the text area to move those characters into the text area.)