An Introduction to Writing Systems & Unicode


Large character sets

Complex script rendering

Text direction

Text boundaries & wrapping

Typographic differences

Sorting & case conversion

About the tutorial

Intended audience

Anyone who wants to better understand how scripts work in computerised environments, and more particularly with regards to Unicode. The material should be accessible for a wide audience, from software engineers to managers.

While the tutorial is perfectly accessible to beginners, it has also attracted very good reviews from people at an intermediate and advanced level, due to the breadth of scripts discussed. No previous knowledge is assumed.

Making the world wide web worldwide!

Why should you read this?

When planning to introduce products into new markets it is important to understand the impact of having to support different scripts. The tutorial will make clear that this is not usually a trivial issue, and if you need to implement support, it may involve decisions at a very early stage in the design process.

This tutorial is particularly useful for people who are new to Unicode, in that it provides an overview of the basics in the context of


This material was initially developed for delivery as a regularly-featured tutorial at Internationalization & Unicode Conferences.

The tutorial will provide you with an understanding of key requirements for implementing writing systems in information technology. It will do this by examining real examples of a wide range of modern scripts to discover features that a computerized implementation must support. It will also make special reference, where appropriate, to how the Unicode Standard points the way forward for meeting these requirements.

The tutorial does not provide detailed coding advice, but does provide the essential background information you need to understand the fundamental issues. It will also constitute an excellent orientation for newcomers to the topic, providing a wide-ranging framework that assists in assimilating further, more detailed and specific information.

Naturally, given the tutorial format this is an ambitious approach, and it will mean that we cannot go into great detail on any particular topic. If you would like to understand a topic better, there are a couple of excellent resources cited at the end of the tutorial, one of which is the very readable Unicode Standard itself.

Scripts addressed and Conventions

We will organize the material in the tutorial by concept, rather than by script. To help you, the script or scripts to which the concept applies will always be listed at the top right of the slide.

The main scripts we will use as examples include:

  • Greek
  • Cyrillic
  • Japanese
  • Chinese (Simplified & Traditional)
  • Korean
  • Arabic
  • Hebrew
  • Thai
  • Indic (represents North & South Indian scripts)

The tutorial covers most of the key features of each of these scripts.

An objective of the tutorial is to introduce a number of terms used to describe script features or characters. These terms are called out under the slide title on slides where they are introduced.

There is a set of web pages with sample text in each of the main scripts we will address. Each of the sample pages is a translation of the same English text. We will use these samples to illustrate as many of the points made as possible. That way you will be able to experiment with the examples yourself. In fact, where I have taken an example from a sample page I have typically included the text of that sample on the slide to help you locate real instances more easily.

If you use these examples for your own material, please ensure that you cite this paper and the web site as a source reference.

Key sources

The top two sources provide very accessible information if you wish to delve deeper into most of the topics covered in this tutorial.

Top of page Large character sets >>

Available at:

Content created February, 2003. Last update 2014-10-17 19:03 GMT