Use accesskey "n" to jump to the internal navigation links at any point. Right now you can

 
ishida >> apps

Unicode Code Converter v7.05

Type or paste text in any of the green or grey shaded boxes and click on the button Convert button above it. Alternative representations will appear in all the other boxes. You can then cut & paste the results into your document. Select selects all the text in a box. Scroll down the page for notes on other options.

 Mixed inputSelect   Convert numbers as   Convert \x 
 CharactersSelect    
 HTML/XMLSelect    Escape invisible characters Convert bidi controls to HTML markup
 Percent encoding for URIsSelect
 Hexadecimal NCRsSelect      Show ascii Latin1
 Decimal NCRsSelect      Show ascii Latin1
 Unicode U+hex notationSelect    Show ascii Latin1
 0x... notationSelect    Show ascii Latin1
 Hexadecimal code pointsSelect    Show ascii Latin1
 Decimal code pointsSelect    Show ascii Latin1
 UTF-8 code unitsSelect
 UTF-16 code unitsSelect
 JavaScript escapesSelect    Use C-style Supp. Chars.
 CSS escapesSelect

Notes

Most of the time you will probably drop the text to be converted in the Mixed input field, and hit the associated Convert button. This will convert all escapes to characters, then convert that into each of the forms listed against the boxes below. If your text contains bare numbers that you also want to convert, use one of the convert buttons to the right. (Be aware, however, that in this case something like 'ab' could be interpreted as a hex number.)

Note, also, that escapes of the form \x, where x is one of a-zA-Z0-9 are not recognised by default. If you check the box next to Convert \x only the special JavaScript escapes are recognised (eg. \b, \n, \t, \", etc.) For full CSS behaviour here, use the CSS input field.

If you only want to convert a specific type of escape and leave all others untouched, paste the text into one of the other boxes and hit its associated Convert button.

You can also pass a string to the page using the q parameter in the URI. For example, http://rishida.net/tools/conversion/?q=Crêpes. You can also pass a string with escapes in it, but you will need to be careful to percent escape characters such as &, + and # which affect the URI syntax. For example, http://rishida.net/tools/conversion/?q=CrU%2B00EApes.

The following describe how the various boxes work. Input describes what happens if you paste or type text into the named field and hit Convert. Output describes the output in the named field if you hit Convert elsewhere.

Characters
Input:  Everything is treated as characters, eg. U+1234 is not treated as an escape.
Output:  Everything displayed as characters.
Other:  You can see a list of names of characters in the Characters field by clicking on the View names button. You can view more detail for each character by clicking on View in UniView.

HTML/XML
Input: Numeric character references or HTML character entities (except < > " and &) are converted to ordinary characters.
Output: Ordinary characters, except that < > " and & are converted to character entities. This is useful for preparing examples of sample code for HTML or XML.
Other: By default the control Escape invisible characters is checked. This will cause selected invisible characters (such as RLM) or ambiguous characters (such as NO-BREAK SPACE) to be converted to escaped form. The characters affected will be added to over time.
There is another control, Convert bidi controls to HTML markup, that will convert RLE, LRE and PDF to HTML markup based on a span element. Note that if your text contains RLO or LRO plus PDF, the PDF will incorrectly be converted to </span> at the moment. I may fix this (and thereby allow RLO/LRO conversion too) at a later date. (Hint: if you want to get the result into source code form, once the initial conversion has been done just click Convert above this text area, and then look in the Characters text area.)

Percent-encoding for URIs
Input:  Can be a mix of text and escapes. Only percent escapes are converted.
Output:  Characters allowed in URI syntax are not converted.

Hexadecimal NCRs
Input: Can be a mix of text and escapes. Only hexadecimal NCRs are converted.
Output: By default, everything except ASCII characters is converted. You can use the checkboxes to specify whether ANSI (Latin1) characters remain unchanged, or whether all characters are converted.

Decimal NCRs
Input: Can be a mix of text and escapes. Only decimal NCRs are converted.
Output: By default, everything except ASCII characters is converted. You can use the checkboxes to specify whether ANSI (Latin1) characters remain unchanged, or whether all characters are converted.

Unicode U+hex notation
Input:  Can be a mix of text and escapes. Only U+hex escapes are converted.
Output: By default, everything except ASCII characters is converted. You can use the checkboxes to specify whether ANSI (Latin1) characters remain unchanged, or whether all characters are converted. Adjacent escapes (only) are separated by a space.
Other:  To separate a sequence of characters by spaces, paste the characters into the Mixed field or Characters field and click Convert. Then click Convert immediately in the Unicode U+hex notation field and look in the Characters field for the result.

0x... hexadecimal notation
Input:  Can be a mix of text and hexadecimal 0x... escapes. Only 0x...escapes are converted.
Output:  By default, everything except ASCII characters is converted. You can use the checkboxes to specify whether ANSI (Latin1) characters remain unchanged, or whether all characters are converted. Adjacent escapes (only) are separated by a space.
Other:  To separate a sequence of characters by spaces, paste the characters into the Mixed field or Characters field and click Convert. Then click Convert immediately in the 0x... notation field and look in the Characters field for the result.

Hexadecimal code points
Input:  Can be a mix of text and hex numbers. Only hex numbers are converted.
Output:  By default, Hex numbers only, all separated by spaces. If you use the checkbox to specify whether ASCII or Latin1 (ANSI) characters remain unchanged, a space is inserted before a code point if the character just before it is in the range [A-Za-z0-9]. (Note that after this you will get a different result in the other boxes if you immediately click on Convert above this box.)

Decimal code points
Input:  Can be a mix of text and decimal numbers. Only decimal numbers are converted.
Output:  By default, decimal numbers only, all separated by spaces. If you use the checkbox to specify whether ASCII or Latin1 (ANSI) characters remain unchanged, a space is inserted before a code point if the character just before it is in the range [A-Za-z0-9]. (Note that after this you will get a different result in the other boxes if you immediately click on Convert above this box.)

UTF-8 code units
Input:  Must be hexadecimal byte codes only, separated by spaces.
Output:  Pairs of 2-digit hexadecimal numbers representing the bytes that make up the text when encoded in UTF-8.

UTF-16 code units
Input:  Must be hexadecimal code units only, separated by spaces.
Output:  Hexadecimal numbers of 1 to 4 digits representing the UTF-16 code units for the text converted. Supplementary characters are represented by two code units.

JavaScript escapes
Input:  Can be a mix of text and escapes. Only JavaScript escapes are converted. Accepts escapes as used in JavaScript, Java and C.
Output:  By default, everything except ASCII characters is converted. You can use the checkboxes to specify whether ANSI (Latin1) characters remain unchanged, or whether all characters are converted. Default output to this field is specifically JavaScript compliant, though this is valid Java code too (a small number of Java-only named escapes such as \e are rendered as numeric escapes). In C-style escapes, supplementary characters are rendered by a single number, rather than two adjacent surrogate code point numbers. You can change supplementary character representations to the C style using the Use C-style Supp. Chars. checkbox..

CSS escapes
Input: Can be a mix of text and escapes.
Output: Does not escape non-control ASCII characters. Output content uses 6-digit escape forms followed by a space for supplementary characters, and 4-digit escapes followed by a space for all other escaped characters.

Developed by: Richard Ishida. Please report any problems to me at ishida@w3.org.

Last update 2012-08-08 12:51 GMT Version information.