This came up again recently in a discussion on the W3C i18n Interest Group list, and I decided to put my thoughts in this post so that I can point people to them easily.
I think HTML4 and HTML5 should continue to support <b> and <i> tags, for backwards compatability, but we should urge caution regarding their use and strongly encourage people to use <em> and <strong> or elements with class="…" where appropriate. (I reworded this 2008-02-01)
Here are a couple of reasons I say that:
-
I constantly see people misusing these tags in ways that can make localization of content difficult.
For example, just because and English document may use italicisation for emphasis, document titles and foreign words, it doesn’t hold that a Japanese translation of the document will use a single presentational convention for all three. Japanese authors may avoid both italicization and bolding, since their characters are too complicated to look good in small sizes with these effects. Japanese translators may find that the content communicates better if they use wakiten (boten marks) for emphasis, but corner brackets for 『 document names 』, and guillemets for 《 foreign words 》. These are common Japanese typographic approaches that we don’t use in English.
The problem is that, if the English author has used <i> tags everywhere (thinking about the presentational rendering he/she wants in English), the Japanese localizer will be unable to easily apply different styling to the different types of text.
The problem could be avoided if semantic markup is used. If the English author had used
<em>..</em>and<span class="doctitle">...</span>and<span class="foreignword">..</span>to distinguish the three cases, it would allow the localizer to easily change the CSS to achieve different effects for these items, one at a time.Of course, over time this is equally relevant to pages that are monolingual. Suppose your new corporate publishing guidelines change, and proclaim that bolding is better than italics for document names. With semantically marked up HTML, you can easily change a whole site with one tiny edit to the CSS. In the situation described above, however, you’d have to hunt through every page for relevant <i> tags and change them individually, so that you didn’t apply the same style change to emphasis and foreign words too.
-
Allowing authors to use <b> and <i> tags is also problematic, in my mind, because it keeps authors thinking in presentational terms, rather than helping them move to properly semantic markup. At the very least, it blurs the ideas. To an author in a hurry, it is also tempting to just slap one of these tags on the text to make it look different, rather than to stop and think about things like consistency and future-proofing. (Yes, I’ve often done it too…)

January 31st, 2008 at 6:33 pm
Is there really a difference between your points 1 and 2? Isn’t localization a specific application of the semantic vs presentation opposition?
January 31st, 2008 at 6:42 pm
In HTML5, as defined today, B, STRONG, I, and EM are entirely different from each other; they cover four different use cases. Under the HTML5 definitions, talking about working out whether one should be using EM or I, or STRONG or B, or EM or STRONG, indicates that one is not up to date with what the elements are defined as.
February 1st, 2008 at 2:01 pm
@Dom: I saw point one as a practical issue for localisation. Point 2 is more about authors changing their mindset, in general, about use of semantic vs presentation markup.
February 1st, 2008 at 2:47 pm
Hi Ian. Actually I wrote this because I had just read the latest editors copy of the HTML5 WD.
I like how you have defined <em> and <strong>, and to a point I like the explanations of <i> and <b>. But they are still presented as somewhat vague presentational devices in that they can cover a range of semantics: <i> is suggested for things “such as a taxonomic designation, a technical term, an idiomatic phrase from another language, a thought, a ship name, or some other prose” and the examples suggest Latin names for flora and fauna and dream sequences; <b> is suggested for “key words in a document abstract, product names in a review, or other spans of text” and the example include special eye-catchers and lede sentences.
And then follows the phrase “whose typical typographic presentation is boldened/italicized”.
I’m trying to make the points that
Therefore…
The spec says that “The i/b element should be used as a last resort when no other element is more appropriate.”, but I think it implies that once you have exhausted the elements offered by HTML5, you have run out of options. You haven’t. You could use class names to label things.
So I would suggest an alternative wording along the lines of “The i/b element should only be used as a last resort when no other element is available and you want the text to be visually distinct in the absence of or inability to use a style sheet. This should only be used as a fallback device, however. It is much better to use an i/b element with a class name that describes the intent of the text, and associate that where possible with a rule in a style sheet.”
For example, I would prefer the spec to change its examples slightly like this:
If one document contained all these examples, I would now be free to restyle them individually and separately as I wish, without having to trawl through the HTML to make the changes.
But I should probably now send this to the HTML5 list for discussion.
February 12th, 2008 at 1:22 am
Aren’t you assuming silently that the translation of a document is required to have the same element structure as the original? Whence such a requirement? Since (as you observe) it’s wrong-headed, why put up with such a requirement if others try to impose it?
February 12th, 2008 at 11:18 am
Hi Michael. I’m assuming silently that when a document is marked up semantically the translation will usually have the same element structure as the original – which is different from requiring it.
I’m not ruling out the possibility of the translation changing the structure. For example, French can express emphasis through words, and may not need italicisation (eg. C’est moi qui l’ai fait! rather than <em>I</em> did it!) My experience is, however, that this is not very common – particularly when dealing with non-literary translations.
But I think that even if you wanted to change the element structure, you’d be better off working with semantic markup than presentational markup.