Accesskey n skips to in page navigation. Skip to the content start
UniView is an XHTML-based application to look up characters, character blocks, paste in and discover unknown characters, store your own info about characters, search on character data, do hex/dec/ncr conversions, highlight character types, etc. etc. It supports Unicode 5.1.0 and is written with Web Standards to work on a variety of browsers
This help file relates to Version 5.1.0a of UniView. The main change in this version is the speed of the initial download of the page. Since data is fetched using AJAX on a just-in-time basis, you no longer have to download and cache the Unicode database before you start. For details see the change history.
The first three digits of the UniView version number reflect the version of Unicode that it supports. This version therefore supports Unicode version 5.1.0.
Either select the range from the pull-down list at the top left, or type it using hex numbers into the Custom
range text box and click on
.
The Custom range field will accept various formats. The numbers must be in hexadecimal form and separated by a colon (the default), a hyphen, one or more spaces, or one or more periods. The numbers can be in the following formats: 1234, ሴ, Ӓ, \u1234, U+1234. The actual number of hex digits can be between 1 and 6.
You can display the result as either a table of characters or a list (that includes names) by clicking the checkbox below entitled Show range as list.
Unassigned character positions are shown in the matrix with a greyed out background (though you can change the colour, if you want).
Click on the checkbox next to Show as graphics to toggle between font glyphs and graphics. Using UTF-8 loads the page faster, but relies on you having a good Unicode font or selection of fonts to cover the Unicode code points. Graphics will be downloaded from the decodeunicode server by default.
If you are looking for fonts I recommend David McCreedy's Gallery of Unicode Fonts or Alan Wood’s Unicode Resources. You should be able to find free fonts there for most characters. (You can also change the default font in UniView, if you wish.)
Type in the string you want to search for in the box labeled Search string and hit enter or click on
.
You can also use regular expressions in searches. For example, suppose you wanted to
find all characters with the word 'tet'. You could type into the input field, \btet\b. The \b represents a word boundary. If you wanted to search for entries containing either the word 'tet' or the word 'tat' you can use the 'or' operator |as in \btet\b|\btat\b.
Another example: You want to search for 'alpha', but you only want results for the Latin characters (not the many Greek or
mathematical results). Simply use the following search string latin.*alpha. The .* represents any number of intervening
characters.
I haven't tested this feature to destruction, but most basic regular expressions that work in both JavaScript and PHP code should work.
Note that by default searches match against any information in the main Unicode database, not just character names, and also searches the information displayed for an individual character under the heading Description in the right panel. You can limit the search using the Names, Descriptions and Other checkboxes under the search input field.
You can also limit the search to the specific range of characters in the Custom range field. This range can be fixed manually or by selecting a Unicode block just above. To limit the search, select the checkbox next to SiR (Search in Range).
Type the hexadecimal codepoints in the box labeled Code point, and hit return or click on
. (See also the next point).
Type the hex number in the box labeled Code point or the
character itself in the box labelled Cut & paste, and click on
. The Unihan information will be displayed in a separate
page.
You can also find a link to the Unihan database in the detailed, by character information in the lower right panel when a CJK Unified Ideograph is displayed there.
Cut and paste the string into the box labeled Cut & paste and hit enter or click on
.
Click on the icon
next to either the
Code point or the Cut & paste boxes. If there is a code point value or a string of
characters in the box to the left, values for those will be automatically shown when the conversion page opens.
Click on Options at the top right of the menu panels, then type the name of the font in the input field
containing the text Apply font. Then hit enter or click on the
icon to the right. Characters for which there is a glyph in the
font will use it.
You can also type a series of fonts, as per the usual CSS syntax, so that if one font is not available the next will be looked for (eg. 'Arial Unicode MS', sans-serif). If you want to use quotes, make them single quotes.
On Gecko-based and Opera browsers, font substitution will ensure that characters will be rendered if they are not in the font chosen but available in another font on your system. In Internet Explorer, if the font chosen doesn't have a glyph for a character, that character will not be displayed. (This can sometimes be useful for determining which characters are contained in a font.)
To return to the default font click the
button
to the right. The font application persists until removed.
(You can also specify you preferred default font).
Because of some limitations in styling I was unable to work around, a fixed height of 600px is applied to the viewing area in the left panel. You can change this manually by clicking on 'More...' at the bottom right of the menu panels, then changing the value in the field set by default to '600px'. Don't forget to specify the 'px' or other measurement!
This is likely to only be useful when increasing the size of the characters on a monitor with a large number of vertical pixels.
Select a property from the Show list drop down list. If the SiR checkbox is selected this will show characters only in the specified custom range.
Note: this is really just a short-cut for a search you could perform with the normal search input field. In that field, ;Lu; would find uppercase letters, and (;Lu;|;Ll;|;Lt;|;Lm;|;Lo;) would find any letter.
Click on the character. The details for that character appear on the right.
Double click on the character.
Click on the checkbox next to Show as graphics.
Select the type of property you want from the Highlight selection lists. Characters with the selected property will be highlighted. The highlighting persists until you turn it off.
Click on Options at the top right of the menu panels, then select a zoom factor from the pull down menu next to the label Left panel that contains the text "100%".
[Note: Increasing or decreasing a browser's text zoom can multiply the effect of the selector. As sizes are mostly in pixels now, rather than ems, you can only apply this extra magnification in IE6 by selecting the accessibility options.]
Click on Options at the top right of the menu panels, then toggle the button labelled
Hide numbers that looks like
or
.
Mouse over a character and the decimal code point value pops up in a tooltip.
Click on the character. The details for that character appear on the right.
Double click on the line containing the character.
Click on the checkbox next to Show as graphics.
Select the type of property you want from the Highlight selection lists. Characters with the selected property will be highlighted. The highlighting persists until you turn it off.
Mouse over a character and the decimal code point value pops up in a tooltip.
Double click on the hex number, and release the mouse button. Then click on the highlighted text and drag and drop or copy and paste the Hex number to the area with a yellow background towards the right of the menu panels. The character will be displayed just above as you move your cursor away.
Click on the
button at the top of the detailed
information pane.
Click on the
button at the top of the detailed
information pane.
Select the DB checkbox to enable this. (I need to fix a bug to get it to work in IE, but it works ok in Firefox and Opera.) When you view character details for which notes exist you will automatically see those notes.
The notes are stored in a database compiled by myself, and are continuously growing.
The notes are included using AJAX.
If you set up your bookmark to UniView to include database=on this feature will be on by default when you start UniView.
This replaces the previous file-based method of notes inclusion. The files used previously were just copies of parts of the database. Now there is no need to load in notes for different ranges of characters. All notes are available at any time.
Click on the decodeUnicode link. A new window will open to show the entry for that character in the decodeUnicode database. decodeUnicode is a wiki where people can provide information about characters.
decodeUnicode.org is a wiki where people can contribute information about Unicode blocks and characters. It is developed at the Department of Design at the University of Applied Sciences in Mainz. The project is supported by the Federal Ministry of Education and Research (BMBF) and has the objectives of creating a basis for fundamental typographic research and facilitating a textual approach to the characters of the world for all computer users. (They also provide the graphic versions of characters for UniView.)
Click on the FileFormat link. A new window will open to show the entry for that character in the FileFormat database.
The FileFormat pages provide useful information for Java and .Net programmers.
Click on the Conversion tool link. A new window will open to show a number of possible alternative representations of the character, eg. numeric character entity references, percent escaped forms, hex and decimal codepoint information, etc.
Click on the link next to the subheading Script group and that block will be opened in the lower left panel.
This is useful for pointing people to particular information using a URI, for example in email. By providing query parameters in the URI you can start up UniView with specific information displayed as follows:
You should only use one of these query parameters in a single call to UniView, with the exception of char=, which can be used with any
of the others.
Eg. http://rishida.net/scripts/uniview/?block=latin%20extended-b&char=01C5
You can also start up UniView with character notes as follows:
uniview/?database=on This will automatically load notes from my character database when you view character details in the lower right panel. You can combine this parameter with any other. For more information about notes, see "display additional notes about characters where available" above.
eg. http://rishida.net/scripts/uniview/?block=thai&database=on
By providing query parameters when you call UniView you can modify the default settings for look and feel as follows:
You can use all or none of these query parameters in a single call to UniView.
If you store your bookmark with these parameters set, you will always open UniView with your preferences.
François Yergeau co-developed the Unicode Code Converter utility, and translated it into French.
Patrick Andries translated UniView into French.
A large amount of code was rewritten to enable data to be downloaded from the server via AJAX at the point of need. This eliminates the long wait when you start to use UniView without the database information in your cache. This means that there is a slightly longer delay when you view a new block, but the code is designed so that if you have already downloaded data, you don't have to retrieve it again from the server.
The search mechanism was also rewritten. The regular expressions used must now be supported in both JavaScript and PHP (PHP is used if not searching within the current range). When 'other' is ticked, the search will look in the alternative name fields, but not in other property settings (so you can no longer use something like ;AL; to search for characters with a particular property. (Use 'Show list' instead.)
Removed several zero-width space characters from the code, which means that UniView now works with Google Chrome, except for some annoying display bugs that I'm not sure how to fix - for example, the first time you try to display any block you only seem to get the top line (although, if you click or drag the mouse, the block is actually there). This seems to be WebKit related, since it happens in Safari, too.
Updated to cover Unicode Version 5.1.0.
Added <option value="(;R;)|(;AL;)">Right-to-left (R or AL)</option> to property lister.
Bugfix: fixed ranges supplied via URI query (used to still split).
Changed the custom range input to a single field that will accept various range formats.
Added the ability to select whether Search looks at any combination of character names only, other parts of a record in the Unicode database, or the other character description information, and added a message to say how many characters were matched.
Added the ability to search within the range specified in the field entitled Range.
Added the ability to list characters with a given General or Bidirectional property (within a specified range or not).
Added an AJAX link to my database of information about Unicode characters. If enabled, using the DB checkbox, this automatically retrieves any available data for a character when information about that character is displayed in the lower right panel. You can also specify that UniView should open with that set as the default using database=on in the URI used to call UniView.
Because of the previous improvement, I removed the ability to link in a file of information about characters. (The information in the files was a copy of the information in the database.)
Moved the Code point(s) and Cut & paste fields lower, to make them easier to use.
Fixed a bug that was preventing the Search function finding characters in the Basic Latin block.
Bugfix: a range like 0036:0067 will always show full rows now; a range with start higher than end will show alert.
Added reference to decodeunicode when graphics aer displayed in left column
Bugfix: search parameter won't break when graphics etc toggled
You can now specify windowHeight parameter at startup in the URI's query string.
Extended the ability to open UniView with data displayed from a URI. In addition to specifying a block and a character, you can now specify a range, a list of codepoints, a list of characters, or a search string. This is useful for pointing people to results using URIs in links or email.
Switching between graphics or fonts for display of characters now refreshes the right panel also.
Clicking on the information about the script group of a character displayed in the right panel will cause that block to be displayed in the left panel. This is particularly useful when you find a single character and want to know what's around it.
Replaced the use of hyphens to specify block names in URI queries with underscores or %20. This may break some existing URIs, but fixes a bug that meant that block names that actually contain hyphens were not displaying.
Added an option to the right hand panel to display the current character in the Unicode Conversion tool.
Fixed some other bugs related to specifying Basic Latin block in a URI.
Reinstated CJK Unified Ideographics and Hangul Syllables in the block selection pull-down, but added a warning and opt out if the block you are about to display contains more than 2000 characters. Also added warning and opt out if you try to specify a range of over 2000 characters.
Substantially revised the code so that UniView now handles ideographic and hangul characters and other characters not in the Unidata database. For example, ideographs now display in the left panel for a specified range and property values are available in the right panel.
Added regular expression support to the search input field.
Changes to the user interface: moved highlighting controls to the initial screens and move others, such as the chart numbering toggle, to the submenu under "Options"; provided wider input fields for codepoint and cut&paste input; replaced the graphics and list toggle icons with checkboxes; provided an icon to quickly clear the contents of the codepoint and cut&paste input fields. A link to the UniHan database was added alongside the Cut & paste input field: when clicked, this icon looks up the first character in either field. A link to the UniHan database was also added to the right panel when a Unified CJK character is displayed there.
The Codepoint input field now accepts more than one codepoint (separated by spaces).
When you double-click on a character in the left panel the codepoint is appended to the Codepoint input field as well as adding the character to the Cut & paste field.
When you click in the checkbox Show as graphics the change is immediately applied to whatever is in the left panel. It no longer redisplays the range if you are looking at, say, a list of characters generated by the Codepoint input, but redisplays the same list.
Set the default font to "Arial Unicode MS, sans-serif".
Added a message for those who do not have JavaScript turned on, and messages to please wait while data is being downloaded on initial startup.
Fixed the icons linking to the converter tool, so that the contents of the adjacent field are passed to the converter and converted automatically.
Added links in the right panel to FileFormat pages (in addition to decodeUnicode). The FileFormat pages provide useful information for Java and .Net users about a given character.
Removed the option to specify your own character notes (I'm not aware that anyone ever did, since it hasn't worked for a while now and no-one has complained). This is because AJAX technology will not allow an XML file to be included from another domain. When that is fixed I will reinstate it.
Fixed a number of other bugs, particularly related to supplementary character support and highlighting.
Updated to support Unicode 5.0.0.
Restyled the menu panels, moving some less used functions to pop up windows to save on horizontal space.
Implemented an AJAX approach for incorporating notes files. This means that the page no longer has to be reloaded to add notes. It is now also possible to add more than one set of notes at a time. Note that these changes requires a small change to the markup of notes files - the div containing the notes for display has to have a class name 'notes' as well as the id for the character.
I added some bundled notes files - most notably myanmar. Note that these are subject to change on an ongoing basis.
Most of the properties display in the character-detail panel on the right are taken from the unicodedata file at the moment. I plan to incorporate additional property information over the coming months, but wanted to release this now so that you can get information about Unicode 5 characters sooner rather than later.
Added a link to the decodeUnicode wiki for each character that is displayed in the right-hand panel.
Provided a way to start up UniView with a particular block and/or character displayed as a table in the lower panels. This should be particularly useful for pointing a person to a particular Unicode block or character in a URI.
Fixed a couple of minor bugs in the CSS.
Rearranged the top of the page to allow UniView to be used in narrower windows.
Added support for Unicode version 4.1.0.
Retrieves graphics from decodeunicode.org rather than the slow-loading and sparse graphics that were available from the Unicode site. Also added my own graphics where decodeunicode has gaps.
Moved the files to PHP. This enables a different approach to the inclusion of user-defined notes that now works on IE and Opera, too.
Another benefit of using PHP is that you can now prep the conversion page with data in the 'Code point' or 'Cut & paste' fields. By clicking on the appropriate icon, the conversion page will now open with the conversions already done for the relevant field.
Yet another benefit of PHP is that, if you really want to, you can now set various preferences related to the intial look and feel by specifying them as query parameters when you call UniView.
NOTE: If you want to be able to download UniView to your hard drive and you don't have a server and PHP, let me know. If enough people ask for it, I will create a downloadable zipped package again that will work without PHP (and without the additional notes feature). I will also post notes on how to customise various aspects of the setup.
Note also that I have disabled links to the French version until and new French translation has been prepared. I will probably not do language based content-negotiation.
Surrogate support added.
You can now double-click on any line in a list on the left, and the character will appear in the Cut&Paste field above.
Han and Hangul character glyphs are now displayed in the right panel after entering a codepoint in the Code Point field. There may not be much information available, but at least you can see the character if you have a font that supports it.
Minor improvements to user interface, including provision of tooltips for all feature selectors.
Disabled (attempts to) display user-defined notes for IE and Opera. I still haven't found how to make it work yet, even using proprietary coding, but at least the attempt won't crash the browser now.
Provided a facility to allow visible area in left panel to be increased.
Name changed to UniView.
Support for Unicode 4.0.0.
No frames. Cross-browser support.
You can specify your preferred default font for display of Unicode characters in prefs.js. If an alternative font is applied using the control on the page, it remains in force for any view until the user sets it back to the default.
Highlighting of General or Bidi properties remains in force until you disabled it, and applies to any matrix or list in the left panel (ie. including search results and cut & paste results).
Script blocks are now grouped with visible labels in the main range-selection pulldown.
Mousing over a character in matrix or list view produces a tool tip containing the decimal code value for the character. In the previous version this was the Hex value, and was limited to the matrix view.
There are no facilities to display information in a pop-up window instead of in the main window. If you want to temporarily display information separately, open a new window.
You used to be able to double-click on a list or on the character descriptions to make the highlighted text appear in various fields. This has not been implemented, but you can still highlight and drag, or copy and paste the text.
Because character sizes are specified in pixels for cross-browser consistency, you must use IE's accessibility options to increase character size in IE over and above what is available from the font size setting provided on the page.
Options for displaying in page descriptions of script blocks have been disabled. Open the files in a separate tab or window as a standalone file.
Add more properties.