Dochula Pass, Bhutan

These are notes on using CSS @font-face to gain finer control over the fonts applied to characters in particular Unicode ranges of your text, without resorting to additional markup. Possibilities and problems.

Changing the font used for certain characters

Most non-English fonts mix glyphs from different writing systems. Usually the font contains glyphs for Latin characters plus a non-Latin script, for example English+Japanese, or English+Thai, etc.

Normally the font designer will take care to harmonise the Latin script glyphs with the non-Latin, but there may be cases where you want to change the design of the glyphs for, say, and embedded script without adding markup to your page.

For example, if I apply the MS-Mincho font to some content in Japanese with embedded Latin text I’m likely to see the following:

Let’s suppose I’d like the English text to appear in a different, proportionally-spaced font. I could put markup around the English and set a class on the markup to apply the font I want, but this is very time consuming and bloats your code.

An alternative is to use @font-face. Here is an example:

@font-face { 
  font-family: myJapanesefont;
  src: local(MS-Mincho);
  }
@font-face {
  font-family: myJapanesefont;
  src: local(Gentium);
  unicode-range: U+41-5A, U+61-7A, U+C0-FF;
  }
p { 
  font-family: myJapanesefont; 
  }

Note: When specifying src the local() keyword indicates that font-face should look for the font on the user’s system. Of course, to improve interoperability, you may want to specify a number of alternatives here, or a downloadable WOFF font. The most interoperable value to use for local() is the Postscript name of the font. (On the Mac open Font Book, select the font, and choose Preview > Show Font Information to find this.)

The result would be:

The first font-face declaration associates the MS-Mincho font with the name ‘myJapanesefont’. The second font-face declaration associates the Gentium font with the Unicode code points in the Latin-1 letter range (of course, you can extend this if you use Latin characters outside that range and they are covered by the font).

Note how I was careful to set the unicode-range values to exclude punctuation (such as the exclamation mark) that would be used by (and harmonised with) the Japanese characters.

Adding support for new characters to a font

You can use the same approach for fonts that don’t have support for a particular Unicode range.

For example, the Nafees Nastaliq font has no glyphs for the Latin range (other than digits), so the browser falls back to the system default.

With the following code, I can produce a more pleasant font for the ‘W3C’ part:

@font-face {; 
  font-family: myUrduFont;
  src: local(NafeesNastaleeq);
  }
@font-face {
  font-family: myUrduFont;
  src: local(BookAntiqua);
  unicode-range: U+30-FF;
  }
div p { 
  font-family: myUrduFont; 
  font-size: 60px;
  }

A big fly in the ointment

If you look at the ranges in the unicode-range value, you’ll see that I kept to just the letters of the alphabet in the Japanese example, and the missing glyphs in the Urdu case.

There are a number of characters that are used by all scripts, however, and these cause problems because you can’t apply fonts based on the context – even if you could work out what that context was.

In the case of the Japanese example above, numbers are left to be rendered by the Mincho font, but when those characters appear in the Latin text, they look incorrectly sized. Look, for example, at the 3 in W3C below.

The same problem arises with spaces and punctuation marks. The exclamation mark was left in the Mincho font in the Japanese example because, in this case, it is part of the Japanese text. Punctuation of this kind, could however be associated with the Latin text. See the question mark in this example.

Even more problematic are the spaces in that example. They are too wide in the Latin text. In Urdu text we have the opposite problem, use Urdu space glyphs in Latin text and you don’t see them at all (there should be a gap between W3C and i18n below).

With my W3C hat on, I start wondering whether there are any rules we can use to apply different glyphs for some characters depending on the script context in which they are used, but then I realise that this is going to bring in all the problems we already have for bidi text when dealing with punctuation or spaces between flows of text in different scripts. I’m not sure it’s a tractable problem without resorting to markup to delimit the boundaries. But then, of course, we end up right back where we started.

So it seems, disappointingly, that the unicode-range property is destined to be of only limited usefulness for me. That’s a real shame.

Another small issue

The examples don’t show major problems, but I assume that sometimes the fonts you want to bring together using font-face will have very different aspect ratios, so you may need to use something like font-size-adjust to balance the size of the fonts being used.

Browser support

The CSS code above worked for me in Chrome and Safari on Mac OS X 10.6. but didn’t work in Firefox or Opera. Nor did it work in IE9 on Windows7.

There appears to be some confusion about XHTML1.0 vs XHTML5. Here is my best shot at an explanation of what XHTML5 is.

* This post is written for people with some background in MIME types and html/xml formats. In case that’s not you, this may give you enough to follow the idea: ‘served as’ means sent from a server to the browser with a MIME type declaration in the HTTP protocol header that says that the content of the page is HTML (text/html) or XML (eg. application/xhtml+xml). See examples and more explanations.

XHTML5 is an HTML5 document served as* application/xhtml+xml (or another XML mime type). The syntax rules for XHTML5 documents are simply those rules given by the XML specification. The vocabulary (elements and attributes) is defined by the HTML5 spec.

Anything served as text/html is not XHTML5.

Note that HTML5 (without the X) can be written in a style that looks like XML syntax. For example, using a / in empty elements (eg. <br/>), or using quotes around attributes. But code written this way is still HTML5, not XHTML5, if it is served as text/html.

There are normally other differences between HTML5 and XHTML5. For example, XHTML5 documents may have an XML declaration at the start of the document. HTML5 documents cannot have that. XHTML5 documents are likely to have a more complicated doctype (to facilitate XML processing). And XHTML5 documents will have an xmlns attribute on the html tag. There are a few other HTML5 features that are not compatible with XML, and must be avoided.

Similar differences existed between HTML 4.01 and XHTML 1.0. However, moving on from XHTML 1.0 will typically involve a subtle but significant shift in thinking. You might have written XHTML 1.0 with no intention of serving it as anything other than text/html. XHTML in the XHTML 1.0 sense tended to be seen largely as a difference in syntax; it was originally designed to be served as XML, but (with some customisations to suit HTML documents) could be, and usually was, served with an HTML mime type. XHTML in the XHTML5 sense, means HTML5 documents served with an XML mime type (and appropriate customisations to suit XML documents), ie. it’s the MIME type, not the syntax, that makes it XHTML.

Which brings us to Polyglot documents. A polyglot document contains markup that is the subset of HTML5 and XML that can be processed as either HTML or XHTML, and can be served as either text/html or application/xhtml+xml, ie. as either HTML5 or XHTML5, without any errors or warnings in either case. The polyglot spec defines the things which allow this compatibility (such as using no XML declaration, proper casing of element names, etc.), and which things to avoid. It also mandates at least one additional extra, ie. disallowing UTF-16 encoded documents.