An Introduction to Writing Systems & Unicode


Large character sets

Complex script rendering

Text direction

Text boundaries & wrapping

Typographic differences

Sorting & case conversion

Character size & line height

Glyph complexity


In this and following slides I look at the minimum number of pixels required on something like an LCD panel to achieve a quality rendering of characters in a number of scripts. This has implication for line height and pixel resolution on screen. It also tends to impact the use of bolding and italicization, since they require additional pixels for rendering.

English generally fits adequately in a 6x8 pixel block.

Unfortunately this is not true for other languages based on the Latin script. Accents over or under upper case letters in particular tend to demand additional pixels.

Japanese typically uses a 16x16 pixel square that include a gutter of 1 pixel horizontally and vertically. I have seen 14x14 pixel implementations, however, that are deemed acceptable. Such arrangements require the omission of some strokes for the more complicated characters, but Japanese people are still able to understand what character is intended.


Most Thai characters require only around 7 pixels in width, although there are a small number that may require around twice that.

In height, however, Thai demands a minimum of 22 pixels (plus more inter-line spacing than is usual for Latin text).


In Chinese (especially Traditional) there are many hundreds of characters that cannot be rendered in a16x16 pixel grid. An adequate size is likely to involve around 24x24 pixels. (Count the lines and spaces on the example on this slide and you will see that a minimal representation of this character requires more than that.)

 got to top of page

Line height & inter-line spacing


Even after supplying sufficient numbers of pixels to accommodate the complex shapes we have seen, many of these scripts demand additional inter-line spacing.

The example on the left of this slide shows what 16x16 pixel characters would look like without additional inter-line spacing. There are two issues here:

  1. the characters appear to run into each other and are difficult to read (especially if underlining is applied)

  2. it is not immediately apparent whether the text should be read vertically or horizontally.

Additional spacing as shown on the right alleviates both of these problems.

 got to top of page

Baseline alignment


Another issue to be borne in mind concerns baseline alignment. The slide shows a number of possible types of baseline.

If using a font that includes more than one script this is not usually an issue since the font designer will normally ensure an appropriate match between baselines and character sizes for glyphs from different scripts. If, however, you are mixing scripts using different fonts, it is important to ensure that alignment is appropriate.

 got to top of page

Proportional spacing


Whereas East Asian characters tend to use mono-spaced glyphs as the default, a script such as Arabic is extremely difficult to fit into a mono-spaced font. Arabic really demands proportionally-spaced glyphs.

In addition, scripts that use combining characters require the ability to overlap characters. This may cause significant problems for LCD panels.

 got to top of page




The term ruby is used to refer to annotations typically occurring in East Asian scripts. In Japanese this is called furigana.

Furigana is typically used to provide phonetic transcriptions (in hiragana) of obscure characters, or characters that the reader is not expected to be familiar with. For example it is widely used in education materials and children’s texts.

Phonetic transcription normally appears above horizontal text. Sometimes semantic information is provided below the horizontal text.


In vertical text, above equates to right, and below equates to left.

 got to top of page



Such annotation in Traditional Chinese uses bopomofo to indicate the pronunciation, and rather than appearing above the main text, the annotation is included vertically to the right of each character, whether the main text is vertical or horizontal.

 got to top of page

Interlinear annotation characters


Unicode provides special control characters that can be used to indicate what is ruby in plain text, as shown on the slide.

NOTE: these characters should not be used in a markup language such as HTML if a markup-based alternative is provided.

 got to top of page

Other typographic differences



This section gathers together a small number of additional typographic features that may differ across scripts.

This slide illustrates alternative, native Japanese, methods of emphasizing text. In the top example, small dots (called wakiten) are placed above the characters to be emphasized – one dot per character. (In vertical text they appear to the right of the character.)

The second example shows emphasis being indicated by the use of a light shaded box behind the relevant characters. This is called amikake.

Note also, as was mentioned earlier, that emphasis can be achieved in German by widening the spaces between characters.


Emphasis in Cyrillic is commonly achieved by italicization, as in Latin text, however italicization of Cyrillic typically changes certain glyphs in a systematic way. It cannot be achieved simply by distorting the non-italicized text slightly. Firstly, many characters adopt a more rounded shape.


Other Cyrillic letters adopt a very different base shape during italicization.

 got to top of page

Kumimoji and warichu


As an example of other typographic effects that may need to be supported, Japanese typography frequently uses approaches such as kumimoji and warichu.

Kumimoji (top line on the slide) refers to composites consisting of up to 5 characters that are reduced in size and combined to fit within the space of a single character. Such arrangements can be created as needed by the user if there is a capability to display the text correctly.

Warichu (bottom line on the slide) is a run of text of reduced font size that appears inside of a line of text as two lines of equal height and length.

(These examples and definitions are taken from the CSS3 Text Module.)

 got to top of page

<< Text boundaries & wrappingTop of pageSorting & case conversion >>

Available at:

Content created February, 2003. Last update 2014-10-17 18:56 GMT