An Introduction to Writing Systems & Unicode

Introduction

Large character sets

Complex script rendering

Text direction

Text boundaries & wrapping

Typographic differences

Sorting & case conversion

About the tutorial

Intended audience

Anyone who wants to better understand how scripts work in computerised environments, and more particularly with regards to Unicode. The material should be accessible for a wide audience, from software engineers to managers.

While the tutorial is perfectly accessible to beginners, it has also attracted very good reviews from people at an intermediate and advanced level, due to the breadth of scripts discussed. No previous knowledge is assumed.

Making the world wide web worldwide!

Why should you read this?

When planning to introduce products into new markets it is important to understand the impact of having to support different scripts. The tutorial will make clear that this is not usually a trivial issue, and if you need to implement support, it may involve decisions at a very early stage in the design process.

This tutorial is particularly useful for people who are new to Unicode, in that it provides an overview of the basics in the context of real world examples.

Objectives

This material was initially developed for delivery as a regularly-featured tutorial at Internationalization & Unicode Conferences.

The tutorial will provide you with an understanding of key requirements for implementing writing systems in information technology. It will do this by examining real examples of a wide range of modern scripts to discover features that a computerized implementation must support.

It will also introduce many of the key terms used to describe script features or characters in the Unicode Standard. When a slide introduces a particular concept, relevant terms are shown under the slide title.

The tutorial does not provide detailed coding advice, but does provide the essential background information you need to understand the fundamental issues. It will also constitute an excellent orientation for newcomers to the topic, providing a wide-ranging framework that assists in assimilating further, more detailed and specific information.

Naturally, given the tutorial format this is an ambitious approach, and it will mean that we cannot go into great detail on any particular topic. If you would like to understand a topic better, there are a couple of excellent resources cited at the end of the tutorial, one of which is the very readable Unicode Standard itself.

Scripts addressed and Conventions

We will organize the material in the tutorial by concept, rather than by script. To help you, the script or scripts to which the concept applies will always be listed at the top right of the slide.

The main scripts we will use as examples include:

  • Greek
  • Cyrillic
  • Japanese
  • Han (Simplified & Traditional)
  • Hangul
  • Arabic
  • Hebrew
  • Indic (represents North & South Indian scripts)
  • Thai

The tutorial covers most of the key features of each of these scripts.

An objective of the tutorial is to introduce a number of terms used to describe script features or characters. These terms are called out under the slide title on slides where they are introduced.

Supporting pages

There is a set of web pages with sample text in each of the main scripts we will address. We will use these samples to illustrate as many of the points made as possible. That way you will be able to experiment with the examples yourself.

In fact, where I have taken an example from a sample page I have typically included the text of that sample on the slide to help you locate real instances more easily.

In addition, there is a table summarising script information for many additional scripts, organized by language, which may be of interest.

Feel free to use these examples for your own material. If you would like to supply a translation of this English text for a script I have not yet summarised, please send it to ishida@w3.org.

Finally, if you want to explore the Unicode character set there are various tools available. The one I use during the tutorial is called UniView. It allows you to look up and see characters (using graphics or fonts) and property information, view whole character blocks or custom ranges, select characters to paste into your document, paste in and discover unknown characters, search for characters, highlight character types, etc. etc.

Key sources

The top two sources provide very accessible information if you wish to delve deeper into most of the topics covered in this tutorial.

Top of page Large character sets >>

Available at: rishida.net/docs/unicode-tutorial/

Content created February, 2003. Last update 2014-10-21 8:13 GMT