Accesskey n skips to in page navigation. Skip to the content start

 
ishida >> utilities

UniView Help & User Guide

UniView is an XHTML-based application to look up characters, character blocks, paste in and discover unknown characters, store your own info about characters, search on character data, do hex/dec/ncr conversions, highlight character types, etc. etc. It supports Unicode 5.1.0 and is written with Web Standards to work on a variety of browsers

This help file relates to Version 5.1.0a of UniView. The main change in this version is the speed of the initial download of the page. Since data is fetched using AJAX on a just-in-time basis, you no longer have to download and cache the Unicode database before you start. For details see the change history.

The first three digits of the UniView version number reflect the version of Unicode that it supports. This version therefore supports Unicode version 5.1.0.

With UniView you can...

see a range of Unicode characters

Either select the range from the pull-down list at the top left, or type it using hex numbers into the Custom range text box and click on  .

The Custom range field will accept various formats. The numbers must be in hexadecimal form and separated by a colon (the default), a hyphen, one or more spaces, or one or more periods. The numbers can be in the following formats: 1234, ሴ, Ӓ, \u1234, U+1234. The actual number of hex digits can be between 1 and 6.

You can display the result as either a table of characters or a list (that includes names) by clicking the checkbox below entitled Show range as list.

Unassigned character positions are shown in the matrix with a greyed out background (though you can change the colour, if you want).

view characters as font glyphs or graphics

Click on the checkbox next to Show as graphics to toggle between font glyphs and graphics. Using UTF-8 loads the page faster, but relies on you having a good Unicode font or selection of fonts to cover the Unicode code points. Graphics will be downloaded from the decodeunicode server by default.

If you are looking for fonts I recommend David McCreedy's Gallery of Unicode Fonts or Alan Wood’s Unicode Resources. You should be able to find free fonts there for most characters. (You can also change the default font in UniView, if you wish.)

search for one or more characters based on text in the unicode database (eg. you know the name contains 'khwai')

Type in the string you want to search for in the box labeled Search string and hit enter or click on  .

You can also use regular expressions in searches. For example, suppose you wanted to find all characters with the word 'tet'. You could type into the input field, \btet\b. The \b represents a word boundary. If you wanted to search for entries containing either the word 'tet' or the word 'tat' you can use the 'or' operator |as in \btet\b|\btat\b.

Another example: You want to search for 'alpha', but you only want results for the Latin characters (not the many Greek or mathematical results). Simply use the following search string latin.*alpha. The .* represents any number of intervening characters.

I haven't tested this feature to destruction, but most basic regular expressions that work in both JavaScript and PHP code should work.

Note that by default searches match against any information in the main Unicode database, not just character names, and also searches the information displayed for an individual character under the heading Description in the right panel. You can limit the search using the Names, Descriptions and Other checkboxes under the search input field.

You can also limit the search to the specific range of characters in the Custom range field. This range can be fixed manually or by selecting a Unicode block just above. To limit the search, select the checkbox next to SiR (Search in Range).

find out about one or more characters, whose hex value you know

Type the hexadecimal codepoints in the box labeled Code point, and hit return or click on  . (See also the next point).

link to the Unihan database for any han character

Type the hex number in the box labeled Code point or the character itself in the box labelled Cut & paste, and click on  . The Unihan information will be displayed in a separate page.

You can also find a link to the Unihan database in the detailed, by character information in the lower right panel when a CJK Unified Ideograph is displayed there.

discover what characters are in a string via cut and paste

Cut and paste the string into the box labeled Cut & paste and hit enter or click on  .

convert between hex, decimal, characters and NCRs, or view utf-8 or utf-16 code equivalents

Click on the icon  next to either the Code point or the Cut & paste boxes. If there is a code point value or a string of characters in the box to the left, values for those will be automatically shown when the conversion page opens.

apply a different font to displayed Unicode characters

Click on Options at the top right of the menu panels, then type the name of the font in the input field containing the text Apply font. Then hit enter or click on the  icon to the right. Characters for which there is a glyph in the font will use it.

You can also type a series of fonts, as per the usual CSS syntax, so that if one font is not available the next will be looked for (eg. 'Arial Unicode MS', sans-serif). If you want to use quotes, make them single quotes.

On Gecko-based and Opera browsers, font substitution will ensure that characters will be rendered if they are not in the font chosen but available in another font on your system. In Internet Explorer, if the font chosen doesn't have a glyph for a character, that character will not be displayed. (This can sometimes be useful for determining which characters are contained in a font.)

To return to the default font click the  button to the right. The font application persists until removed.

(You can also specify you preferred default font).

increase the height of the viewable area in the left panel

Because of some limitations in styling I was unable to work around, a fixed height of 600px is applied to the viewing area in the left panel. You can change this manually by clicking on 'More...' at the bottom right of the menu panels, then changing the value in the field set by default to '600px'. Don't forget to specify the 'px' or other measurement!

This is likely to only be useful when increasing the size of the characters on a monitor with a large number of vertical pixels.

list characters with a given General or Bidirectional property

Select a property from the Show list drop down list. If the SiR checkbox is selected this will show characters only in the specified custom range.

Note: this is really just a short-cut for a search you could perform with the normal search input field. In that field, ;Lu; would find uppercase letters, and (;Lu;|;Ll;|;Lt;|;Lm;|;Lo;) would find any letter.

While looking at a matrix of characters (bottom left) you can also...

show detailed information for any character in the matrix

Click on the character. The details for that character appear on the right.

quickly transfer a copy of the character to the Cut & Paste and Code Point input fields

Double click on the character.

convert the matrix to a list

Click on the checkbox next to Show as graphics.

highlight the characters in the matrix by property (General category and Combining class, or Directionality )

Select the type of property you want from the Highlight selection lists. Characters with the selected property will be highlighted. The highlighting persists until you turn it off.

increase the size of the characters

Click on Options at the top right of the menu panels, then select a zoom factor from the pull down menu next to the label Left panel that contains the text "100%".

[Note: Increasing or decreasing a browser's text zoom can multiply the effect of the selector. As sizes are mostly in pixels now, rather than ems, you can only apply this extra magnification in IE6 by selecting the accessibility options.]

switch between showing or hiding hex numbers around the matrix

Click on Options at the top right of the menu panels, then toggle the button labelled Hide numbers that looks like  or  .

find out the decimal codepoint value for a character

Mouse over a character and the decimal code point value pops up in a tooltip.

While looking at a list (bottom left) you can also...

show detailed information for any character in the list

Click on the character. The details for that character appear on the right.

quickly transfer a copy of the character to the Code Point and Cut & Paste input fields

Double click on the line containing the character.

convert the list to a matrix (if this was a range)

Click on the checkbox next to Show as graphics.

highlight the characters in the list by property (General category and Combining class, or Directionality )

Select the type of property you want from the Highlight selection lists. Characters with the selected property will be highlighted. The highlighting persists until you turn it off.

find out the decimal codepoint value for a character

Mouse over a character and the decimal code point value pops up in a tooltip.

While looking at the detailed information (bottom right) you can also...

view any character represented by a hex number

Double click on the hex number, and release the mouse button. Then click on the highlighted text and drag and drop or copy and paste the Hex number to the area with a yellow background towards the right of the menu panels. The character will be displayed just above as you move your cursor away.

view the previous character in the Unicode database

Click on the  button at the top of the detailed information pane.

view the next character in the Unicode database

Click on the  button at the top of the detailed information pane.

display additional notes about characters where available

Select the DB checkbox to enable this. (I need to fix a bug to get it to work in IE, but it works ok in Firefox and Opera.) When you view character details for which notes exist you will automatically see those notes.

The notes are stored in a database compiled by myself, and are continuously growing.

The notes are included using AJAX.

If you set up your bookmark to UniView to include database=on this feature will be on by default when you start UniView.

This replaces the previous file-based method of notes inclusion. The files used previously were just copies of parts of the database. Now there is no need to load in notes for different ranges of characters. All notes are available at any time.

look up information about that character in other databases

Click on the decodeUnicode link. A new window will open to show the entry for that character in the decodeUnicode database. decodeUnicode is a wiki where people can provide information about characters.

decodeUnicode.org is a wiki where people can contribute information about Unicode blocks and characters. It is developed at the Department of Design at the University of Applied Sciences in Mainz. The project is supported by the Federal Ministry of Education and Research (BMBF) and has the objectives of creating a basis for fundamental typographic research and facilitating a textual approach to the characters of the world for all computer users. (They also provide the graphic versions of characters for UniView.)

Click on the FileFormat link. A new window will open to show the entry for that character in the FileFormat database.

The FileFormat pages provide useful information for Java and .Net programmers.

Click on the Conversion tool link. A new window will open to show a number of possible alternative representations of the character, eg. numeric character entity references, percent escaped forms, hex and decimal codepoint information, etc.

display the block to which the character belongs

Click on the link next to the subheading Script group and that block will be opened in the lower left panel.

You can tailor the program by...

using URIs to start up UniView with data in left or right panels

This is useful for pointing people to particular information using a URI, for example in email. By providing query parameters in the URI you can start up UniView with specific information displayed as follows:

You should only use one of these query parameters in a single call to UniView, with the exception of char=, which can be used with any of the others.
Eg. http://rishida.net/scripts/uniview/?block=latin%20extended-b&char=01C5

You can also start up UniView with character notes as follows:

uniview/?database=on This will automatically load notes from my character database when you view character details in the lower right panel. You can combine this parameter with any other. For more information about notes, see "display additional notes about characters where available" above.
eg. http://rishida.net/scripts/uniview/?block=thai&database=on

setting default display preferences

By providing query parameters when you call UniView you can modify the default settings for look and feel as follows:

You can use all or none of these query parameters in a single call to UniView.

If you store your bookmark with these parameters set, you will always open UniView with your preferences.

Acknowledgements and thanks

François Yergeau co-developed the Unicode Code Converter utility, and translated it into French.

Patrick Andries translated UniView into French.

Change history

Changes in version 5.1.0a

A large amount of code was rewritten to enable data to be downloaded from the server via AJAX at the point of need. This eliminates the long wait when you start to use UniView without the database information in your cache. This means that there is a slightly longer delay when you view a new block, but the code is designed so that if you have already downloaded data, you don't have to retrieve it again from the server.

The search mechanism was also rewritten. The regular expressions used must now be supported in both JavaScript and PHP (PHP is used if not searching within the current range). When 'other' is ticked, the search will look in the alternative name fields, but not in other property settings (so you can no longer use something like ;AL; to search for characters with a particular property. (Use 'Show list' instead.)

Removed several zero-width space characters from the code, which means that UniView now works with Google Chrome, except for some annoying display bugs that I'm not sure how to fix - for example, the first time you try to display any block you only seem to get the top line (although, if you click or drag the mouse, the block is actually there). This seems to be WebKit related, since it happens in Safari, too.

Changes in version 5.1.0

Updated to cover Unicode Version 5.1.0.

Added <option value="(;R;)|(;AL;)">Right-to-left (R or AL)</option> to property lister.

Bugfix: fixed ranges supplied via URI query (used to still split).

Changes in version 5.0.0c

Changed the custom range input to a single field that will accept various range formats.

Added the ability to select whether Search looks at any combination of character names only, other parts of a record in the Unicode database, or the other character description information, and added a message to say how many characters were matched.

Added the ability to search within the range specified in the field entitled Range.

Added the ability to list characters with a given General or Bidirectional property (within a specified range or not).

Added an AJAX link to my database of information about Unicode characters. If enabled, using the DB checkbox, this automatically retrieves any available data for a character when information about that character is displayed in the lower right panel. You can also specify that UniView should open with that set as the default using database=on in the URI used to call UniView.

Because of the previous improvement, I removed the ability to link in a file of information about characters. (The information in the files was a copy of the information in the database.)

Moved the Code point(s) and Cut & paste fields lower, to make them easier to use.

Fixed a bug that was preventing the Search function finding characters in the Basic Latin block.

Bugfix: a range like 0036:0067 will always show full rows now; a range with start higher than end will show alert.

Added reference to decodeunicode when graphics aer displayed in left column

Bugfix: search parameter won't break when graphics etc toggled

You can now specify windowHeight parameter at startup in the URI's query string.

Changes in version 5.0.0b

Extended the ability to open UniView with data displayed from a URI. In addition to specifying a block and a character, you can now specify a range, a list of codepoints, a list of characters, or a search string. This is useful for pointing people to results using URIs in links or email.

Switching between graphics or fonts for display of characters now refreshes the right panel also.

Clicking on the information about the script group of a character displayed in the right panel will cause that block to be displayed in the left panel. This is particularly useful when you find a single character and want to know what's around it.

Replaced the use of hyphens to specify block names in URI queries with underscores or %20. This may break some existing URIs, but fixes a bug that meant that block names that actually contain hyphens were not displaying.

Added an option to the right hand panel to display the current character in the Unicode Conversion tool.

Fixed some other bugs related to specifying Basic Latin block in a URI.

Reinstated CJK Unified Ideographics and Hangul Syllables in the block selection pull-down, but added a warning and opt out if the block you are about to display contains more than 2000 characters. Also added warning and opt out if you try to specify a range of over 2000 characters.

Changes in version 5.0.0a

Substantially revised the code so that UniView now handles ideographic and hangul characters and other characters not in the Unidata database. For example, ideographs now display in the left panel for a specified range and property values are available in the right panel.

Added regular expression support to the search input field.

Changes to the user interface: moved highlighting controls to the initial screens and move others, such as the chart numbering toggle, to the submenu under "Options"; provided wider input fields for codepoint and cut&paste input; replaced the graphics and list toggle icons with checkboxes; provided an icon to quickly clear the contents of the codepoint and cut&paste input fields. A link to the UniHan database was added alongside the Cut & paste input field: when clicked, this icon looks up the first character in either field. A link to the UniHan database was also added to the right panel when a Unified CJK character is displayed there.

The Codepoint input field now accepts more than one codepoint (separated by spaces).

When you double-click on a character in the left panel the codepoint is appended to the Codepoint input field as well as adding the character to the Cut & paste field.

When you click in the checkbox Show as graphics the change is immediately applied to whatever is in the left panel. It no longer redisplays the range if you are looking at, say, a list of characters generated by the Codepoint input, but redisplays the same list.

Set the default font to "Arial Unicode MS, sans-serif".

Added a message for those who do not have JavaScript turned on, and messages to please wait while data is being downloaded on initial startup.

Fixed the icons linking to the converter tool, so that the contents of the adjacent field are passed to the converter and converted automatically.

Added links in the right panel to FileFormat pages (in addition to decodeUnicode). The FileFormat pages provide useful information for Java and .Net users about a given character.

Removed the option to specify your own character notes (I'm not aware that anyone ever did, since it hasn't worked for a while now and no-one has complained). This is because AJAX technology will not allow an XML file to be included from another domain. When that is fixed I will reinstate it.

Fixed a number of other bugs, particularly related to supplementary character support and highlighting.

Changes in version 5.0.0

Updated to support Unicode 5.0.0.

Restyled the menu panels, moving some less used functions to pop up windows to save on horizontal space.

Implemented an AJAX approach for incorporating notes files. This means that the page no longer has to be reloaded to add notes. It is now also possible to add more than one set of notes at a time. Note that these changes requires a small change to the markup of notes files - the div containing the notes for display has to have a class name 'notes' as well as the id for the character.

I added some bundled notes files - most notably myanmar. Note that these are subject to change on an ongoing basis.

Most of the properties display in the character-detail panel on the right are taken from the unicodedata file at the moment. I plan to incorporate additional property information over the coming months, but wanted to release this now so that you can get information about Unicode 5 characters sooner rather than later.

Changes in version 4.1.0b

Added a link to the decodeUnicode wiki for each character that is displayed in the right-hand panel.

Provided a way to start up UniView with a particular block and/or character displayed as a table in the lower panels. This should be particularly useful for pointing a person to a particular Unicode block or character in a URI.

Fixed a couple of minor bugs in the CSS.

Changes in version 4.1.0a

Rearranged the top of the page to allow UniView to be used in narrower windows.

Added support for Unicode version 4.1.0.

Retrieves graphics from decodeunicode.org rather than the slow-loading and sparse graphics that were available from the Unicode site. Also added my own graphics where decodeunicode has gaps.

Moved the files to PHP. This enables a different approach to the inclusion of user-defined notes that now works on IE and Opera, too.

Another benefit of using PHP is that you can now prep the conversion page with data in the 'Code point' or 'Cut & paste' fields. By clicking on the appropriate icon, the conversion page will now open with the conversions already done for the relevant field.

Yet another benefit of PHP is that, if you really want to, you can now set various preferences related to the intial look and feel by specifying them as query parameters when you call UniView.

NOTE: If you want to be able to download UniView to your hard drive and you don't have a server and PHP, let me know. If enough people ask for it, I will create a downloadable zipped package again that will work without PHP (and without the additional notes feature). I will also post notes on how to customise various aspects of the setup.

Note also that I have disabled links to the French version until and new French translation has been prepared. I will probably not do language based content-negotiation.

Changes in version 4.1

Surrogate support added.

You can now double-click on any line in a list on the left, and the character will appear in the Cut&Paste field above.

Han and Hangul character glyphs are now displayed in the right panel after entering a codepoint in the Code Point field. There may not be much information available, but at least you can see the character if you have a font that supports it.

Changes in version 4.0.1

Minor improvements to user interface, including provision of tooltips for all feature selectors.

Disabled (attempts to) display user-defined notes for IE and Opera. I still haven't found how to make it work yet, even using proprietary coding, but at least the attempt won't crash the browser now.

Provided a facility to allow visible area in left panel to be increased.

Changes in version 4.0

Name changed to UniView.

Support for Unicode 4.0.0.

No frames. Cross-browser support.

You can specify your preferred default font for display of Unicode characters in prefs.js. If an alternative font is applied using the control on the page, it remains in force for any view until the user sets it back to the default.

Highlighting of General or Bidi properties remains in force until you disabled it, and applies to any matrix or list in the left panel (ie. including search results and cut & paste results).

Script blocks are now grouped with visible labels in the main range-selection pulldown.

Mousing over a character in matrix or list view produces a tool tip containing the decimal code value for the character. In the previous version this was the Hex value, and was limited to the matrix view.

There are no facilities to display information in a pop-up window instead of in the main window. If you want to temporarily display information separately, open a new window.

You used to be able to double-click on a list or on the character descriptions to make the highlighted text appear in various fields. This has not been implemented, but you can still highlight and drag, or copy and paste the text.

Because character sizes are specified in pixels for cross-browser consistency, you must use IE's accessibility options to increase character size in IE over and above what is available from the font size setting provided on the page.

Options for displaying in page descriptions of script blocks have been disabled. Open the files in a separate tab or window as a standalone file.

Things still to be done

Add more properties.

Author: Richard Ishida.

Valid XHTML 1.0!
Valid CSS!
Encoded in UTF-8!

Last update 2008-11-01 7:51 GMT