It’s disappointing to see that non-standard implementations of UTF-8 are being used by the BBC on their BBC Burmese Facebook page.

Take, for example, the following text.

On the actual BBC site it looks like this (click on the burmese text to see a list of the characters used):

အိန္ဒိယ မိန်းမငယ် ၂ဦး အမှု ဆေးစစ်ချက် ကွဲလွဲနေ

As far as I can tell, this is conformant use of Unicode codepoints.

Look at the same title on the BBC’s Facebook page, however, and you see:

အိႏၵိယ မိန္းမငယ္ ၂ဦး အမႈ ေဆးစစ္ခ်က္ ကြဲလြဲေန

Depending upon where you are reading this (as long as you have some Burmese font and rendering support), one of the two lines of Burmese text above will contain lots of garbage. For me, it’s the second (non-standard).

This non-standard approach uses visual encoding for combining characters that appear before or on both sides of the base, uses Shan or Rumai Palaung codepoints for subjoining consonants, uses the wrong codepoints for medial consonants, and uses the virama instead of the asat at the end of a word.

I assume that this is because of prevalent use of the non-standard approach on mobile devices (and that the BBC is just following that trend), caused by hacks that arose when people were impatient to get on the Web but script support was lagging in applications.

However, continuing this divergence does nobody any long-term good.

[ Find fonts and other resources for the Myanmar script ]