Dochula Pass, Bhutan

The language subtag lookup tool now has links to Wikipedia search for all languages and scripts listed. This helps for finding information about languages, now that SIL limits access to their Ethnologue, and offers a new source of information for the scripts listed.

Picture of the page in action.

This update to the Language Subtag Lookup tool brings back the Check function that had been out of action since last January. The code had to be pretty much completely rewritten to migrate it from the original PHP. In the process, I added support for extension and private use tags, and added several more checks. I also made various changes to the way the results are displayed.

Give it a try with this rather complicated, but valid language tag: zh-cmn-latn-CN-pinyin-fonipa-u-co-phonebk-x-mytag-yourtag

Or try this rather badly conceived language tag, to see some error messages: mena-fr-latn-fonipa-biske-x-mylongtag-x-shorter

The IANA database information is up-to-date. The tool currently supports the IANA Subtag registry of 2014-12-17. It reports subtags for 8,081 languages, 228 extlangs, 174 scripts, 301 regions, 68 variants, and 26 grandfathered subtags.

>> Use it

Picture of the page in action.

About the tool: BCP 47 language tags are built from subtags in the IANA Subtag Registry. This tool helps you find or look up subtags and check for errors in language tags. It also provides information to guide your choices.

Latest changes: I reworked the informational text that accompanies macrolanguages, their encompassed languages, and extlang subtags. As part of that, I changed the code to allow for highlighting of specific cases. For example, where legacy may dictate that the macrolanguage subtag (zh) is more useful for Mandarin Chinese than the more specific tags (cmn or zh-cmn).

I simplified the intro to the page, but added a link to the new article Choosing a Language Tag, which provides useful step-by-step guidelines on creating language tags.

I also changed the user interface somewhat. The input fields are easier to work with and take up less vertical space. Also, you can now submit a query by simply hitting return after typing into a field. I had originally required you to click on a submit button so that all values in other fields would be retained when the answer is shown – this was so that while checking various subtags you could build up a language tag in the Check field for later checking. I just found that the annoyance of continually having to resubmit after forgetting to click on the submit button wasn’t worth the extra functionality (and I was also encouraged to do so by feedback from Bert Bos).

>> Use it

Picture of the page in action.

I have added a bunch of additional new features to my lookup tool to help with choosing language tags. There is additional information available when you look up subtags (such as what to use if the subtag is deprecated, and what subtags macrolanguages enclose, etc.), and more tests on well-formedness with clearer explanations of the problem. Example.

This should make it a lot more useful to people who haven’t read BCP 47 and want to create language tags. Hopefully, in a short while, I’ll also write and link to an article that describes how to use subtags from the ground up in a procedural way, that will complement the tool.

For further assistance, you can now link from a language subtag result to the SIL Ethnologue, to make it easier to check whether that subtag really does refer to the language you were thinking of.

In addition, script subtag results link to Unicode blocks in UniView.

>> Use it

Picture of the page in action.

The IANA Subtag Registry has been recently updated to contain 220 extlang subtags and the ISO 639-3 language subtags, taking the total number of subtags to almost 8,000.

I have produced a new version of my lookup tool to help with language tagging. In addition to helping you find subtags and lookup the meaning of subtags, it now helps check the well-formedness of a language tag.

The tool provides access to all currently defined subtags, including the new extlang subtags.

Parsing language tags. In addition to trying to make the user interface more friendly, I also added the ability to parse hyphenated tags and discover their structure and check for errors. I’m not claiming with this release that the new parser field tests all the corner cases, but it should provide reports for most of the typical errors.

It reports errors for the following:

– subtags that are not in the registry (by type)
– incorrectly ordered subtags
– duplicate variant tags and multiple tags of other types
– overlong private use subtags

Try this example.

It doesn’t yet handle extensions, but then there aren’t any valid ones to handle yet anyway.

I hope that’s useful.

>> Use it !

Picture of the page in action.

This tool allows you to search for subtags that have, say, ‘french’ in their description (there are currently 11), or to find out what that mysterious ‘ch’ subtag stands for (there are 2 possibilities).

Update: You can now also search for a hyphen-separated sequences of subtags, such as sl-IT-nedis and find out what each of the component subtags mean.

Alternatively, you can simply list all current language tags, or script tags, or variants, etc.

For months I’ve been wanting to write a small, Web-based tool for finding things in the subtag registry without having to work on the (for many people, intimidating) raw text file on the IANA site.

Tom Gruner created an initial tool for pretty printing the IANA list, which handled enough of the basics to allow me to use the little time I have these days to add the search functionality on top.

If you have JavaScript running, you are shown just the tags and descriptions initially, but by clicking on those you can reveal all additional information in the registry for a given tag. I also highlight tags that are deprecated, so you can see that straight away.

(PS: Some final tweaks to the code will come when I have a spare moment for things like making the expanding list more accessible, etc.)