Dochula Pass, Bhutan

>> Use it !

Picture of the page in action.

This web-based tool helps you convert between a number of Unicode escape and code formats.

Changes in the new version:

  • Convert from JavaScript, Java and C escape notation, and to JavaScript/Java escapes (with switch to show C-style supplementary characters)
  • Convert to and from CSS escape notation
  • Convert from HTML/XML code with escapes to code with just characters
  • Convert < > ” or & in HTML/XML code to entities
  • Option to show ASCII characters when converting to NCRs
  • View a set of characters in UniView by clicking on the View in UniView button

For CSS output I chose the 6-figure version with no optional space, since I thought it was clearest. I’ve had a request to change it to the shortest form (4 or 6 figures) followed by space. If other people prefer that, I may change it.

Update: Markus Scherer convinced me to change the CSS output. So rather than 6-figure escapes with no space, the output now contains 6-figure escapes followed by a space for supplementary characters, and 4-figure escapes followed by a space elsewhere.

>> See what it can do !

>> Use it !

Picture of the page in action.

I found a little more time to work on UniView while flying to the US for the I18n & Unicode Conference yesterday, adding a bunch of additional useful features.

Changes include:

  • Extended the ability to open UniView with data displayed from a URI. In addition to specifying a block and a character, you can now specify a range, a list of codepoints, a list of characters, or a search string. This is useful for pointing people to results using URIs in links or email.
  • Switching between graphics or fonts for display of characters now refreshes the right panel also.
  • Clicking on the information about the script group of a character displayed in the right panel will cause that block to be displayed in the left panel. This is particularly useful when you find a single character and want to know what’s around it.
  • Replaced the use of hyphens to specify block names in URI queries with underscores or %20. This may break some existing URIs, but fixes a bug that meant that block names that actually contain hyphens were not displaying.
  • Added an option to the right hand panel to display the current character in the Unicode Conversion tool.
  • Fixed some other bugs related to specifying Basic Latin block in a URI.
  • Reinstated CJK Unified Ideographics and Hangul Syllables in the block selection pull-down, but added a warning and opt out if the block you are about to display contains more than 2000 characters. Also added warning and opt out if you try to specify a range of over 2000 characters.

Please report any bugs to me, and don’t forget to refresh any UniView files in your cache before using the new version.

>> See what it can do !

>> Use it !

Picture of the page in action.

In little pockets of time recently I’ve been making some significant improvements to my UniView tool, the character map on steroids.

Changes include:

  • Substantially revised the code so that handling of ideographic and hangul characters and other characters not in the Unidata database is much improved. For example, ideographs now display in the left panel for a specified range and property values are available in the right panel.
  • Added regular expression support to the search input field.
  • Changes to the user interface: moved highlighting controls to the initial screens and move others, such as the chart numbering toggle, to the submenu under “Options”; provided wider input fields for codepoint and cut&paste input; replaced the graphics and list toggle icons with checkboxes; provided an icon to quickly clear the contents of the codepoint and cut&paste input fields. A link to the UniHan database was added alongside the Cut & paste input field: when clicked, this icon looks up the first character in either field. A link to the UniHan database was also added to the right panel when a Unified CJK character is displayed there.
  • The Codepoint input field now accepts more than one codepoint (separated by spaces).
  • When you double-click on a character in the left panel the codepoint is appended to the Codepoint input field as well as adding the character to the Cut & paste field.
  • When you click in the checkbox Show as graphics the change is immediately applied to whatever is in the left panel. It no longer redisplays the range if you are looking at, say, a list of characters generated by the Codepoint input, but redisplays the same list.
  • Set the default font to “Arial Unicode MS, sans-serif”.
  • Added a message for those who do not have JavaScript turned on, and messages to please wait while data is being downloaded on initial startup.
  • Fixed the icons linking to the converter tool, so that the contents of the adjacent field are passed to the converter and converted automatically.
  • Added links in the right panel to FileFormat pages (in addition to decodeUnicode). The FileFormat pages provide useful information for Java and .Net users about a given character.
  • Removed the option to specify your own character notes (I’m not aware that anyone ever did, since it hasn’t worked for a while now and no-one has complained). This is because AJAX technology will not allow an XML file to be included from another domain. When that is fixed I will reinstate it.
  • Fixed a number of other bugs, particularly related to supplementary character support and highlighting.

Please report any bugs to me, and don’t forget to refresh any UniView files in your cache before using the new version.

I’m at the ITS face-to-face meeting in Prague, Czech Republic and I’ve been trying to learn to read Czech words. Jirka Kosek showed me a Czech tongue-twister last night at dinner.

Strč prst skrz krk.

How amazing is that? A whole sentence without vowels! (Means “Put your finger down your throat.” – I’m wondering whether that has something to do with the missing vowels…)

See a video of Jirka pronouncing it.