I use XMetal 4.6 for all my XHTML and XML authoring. As someone who has been advocating for some time that you should always declare the human language of your content when creating Web content, I’m finding XMetal’s spell checker both exciting and frustrating. Here are a few tips that might help others.

The exciting part is that XMetal figures out which spell checker to use based on the xml:lang language declarations. Given the following code:

<html xml:lang="en-us" lang="en-us" ... > 
<p>behavior localization color</p>
<p>behaviour localisation colour</p>
<p xml:lang="fr" lang="fr">ceci est français</p>
<p lang="gr" xml:lang="gr">Κάνοντας τον Παγκόσμιο Ιστό πραγματικά Παγκόσμιο</p>

The spell checker will recognize three errors (behaviour localisation colour). The en-us value in the html tag causes it to use the US-English spell check dictionary, and the fr and gr values in the last two paragraphs cause it to use a French and Greek dictionary, respectively, for the words in those elements. Great!

Picture of the spell checker in action.

Note that, since XMetal is an XML editor, rather than an HTML editor, it is the value in the xml:lang attribute rather than the one in the lang attribute that counts here. For XHTML 1.0 content served as text/html, of course, you should use both.

The following, however, are things you need to watch out for:

  1. If your html tag contains just xml:lang=”en” your spell checking won’t be terribly effective, since all the English dictionaries (US, UK, Australia, and Canada) will be used. This means that for the code above you will receive no error notifications, since each spelling occurs in at least one dictionary.

    This is logical enough, though it’s something you may not think about when spell checking. (Even if you go into the spell checker options and set, say, the US English spell checker, the language declaration will override that manual choice.)

  2. If you want to write British English, you would normally put en-GB in the xml:lang (because that’s what BCP 47 says you should do). Unfortunately this will produce no errors with our test case above! XMetal doesn’t recognise the GB subtag, and reverts to xml:lang=”en”. To get the behaviour you are expecting you have to put en-UK in xml:lang. This is really bad. It means you are marking up your content incorrectly. Presumably the same holds true for other languages. I see CF for Canadian French, rather than CA, SD for Swiss German rather than CH, etc.

It’s good to see that the language markup is being used for spell-checking. However, it’s a case of two steps forward, one step back. Which is a shame.

UPDATE: Justsystems have worked on this some more. See my later blog post for details.