Myanmar script notes

I am compiling these notes as I explore the Myanmar script as used for Burmese. They may be updated from time to time.

The page contains brief notes on general script features and discussions about which Unicode characters are most appropriate when there is a choice. See also the companion document, Myanmar Character Notes, which describes the characters used in Myanmar script one by one.

For more detailed information, especially about the history and phonology of the Myanmar script, follow the links in the text and at the bottom of the page. When you see red text (examples of Burmese) you can click on it to reveal the component characters.

You can obtain fonts for this page free from the Web, but you must use fonts compatible with Unicode 5.1. For this page I used the free Tharlon. Click the blue vertical bar at the bottom right of the page to apply other fonts, if you have them on your system.

Brief script introduction

The script is an abugida, ie. consonants carry an inherent vowel sound that is overridden using vowel signs.

Spaces are used to separate phrases, rather than words. Words can be separated with U+200C ZERO WIDTH NON-JOINER to allow for easy wrapping of text.

Text runs from left to right.

There are a set of Myanmar numerals, which are used just like Latin digits.

Example of Burmese:

လူတိုင်းသည် တူညီ လွတ်လပ်သော ဂုဏ်သိက္ခါဖြင့် လည်းကောင်း၊ တူညီလွတ်လပ်သော အခွင့်အရေးများဖြင့် လည်းကောင်း၊ မွေးဖွားလာသူများ ဖြစ်သည်။ ထိုသူတို့၌ ပိုင်းခြား ဝေဖန်တတ်သော ဉာဏ်နှင့် ကျင့်ဝတ် သိတတ်သော စိတ်တို့ရှိကြ၍ ထိုသူတို့သည် အချင်းချင်း မေတ္တာထား၍ ဆက်ဆံကျင့်သုံးသင့်၏။

Characters used for Burmese

Overview

The Burmese language is tonal and syllable-based.

Words are composed of syllables. These start with an consonant or initial vowel. An initial consonant may be followed by a medial consonant, which adds the sound j or w. After the vowel, a syllable may end with a nasalisation of the vowel or an unreleased glottal stop, though these final sounds can be represented by various different consonant symbols.

At the end of a syllable a final consonant usually has an 'asat' sign above it, to show that there is no inherent vowel.

In multisyllabic words derived from an Indian language such as Pali, where two consonants occur internally with no intervening vowel, the consonants tend to be stacked vertically, and the asat sign is not used.

Burmese uses the Myanmar numerals, which are used just like Latin digits.

Consonants

Native Burmese words use a subset of the consonants that make up the traditional articulatory arrangement of indic scripts, however additional symbols are available for use in loan words, especially Indian loan words. These include the retroflex and voiced aspirated consonants. Other characters in the Myanmar Unicode block are used for variations for minority scripts based on myanmar. The latter are not dealt with here.

Syllable-final consonants and asat. When there is a consonant at the end of a syllable, it carries a visible mark called 'asat' to indicate that the inherent vowel is killed, eg. see the small 'c' like mark over the last character in ဝင် (enter). The U+103A MYANMAR SIGN ASAT is a new character provided in Unicode version 5.1 for this purpose. It is effectively a visible virama.

In native Myanmar, 9 characters (5 nasals, င ဉ ည န မ NGA, NYA, NNYA, NA and MA, and 4 stops, က စ တ ပ KA, CA, TA, PA) appear in syllable final position. In final position nasals are pronounced as a nasalization of the previous vowel, eg. ရင် (if), and all stops are pronounced ʔ, eg. မတ် maʔ (March).

Some syllables ending in nasal consonants use the anusvara rather than the ordinary consonant sign, eg. သိမ်း θèɪ̃ but သုံး θòʊ̃.

(Note that the ASAT is also used over  ာ and to produce vowel+tone combinations.)

Consonant stacking. In many multi-syllabic words (mostly derived from Pali), consonants that have no intervening inherent vowel are arranged such that the consonant cluster is stacked. The second consonant appears below the first, eg. မန္တလေး mã̀dəlè (Mandalay), and ဗုဒ္ဓ (Buddha). In some cases the lower character is abbreviated or reoriented, eg. က္ဌ.

This effect is achieved in Unicode by using the character U+1039 MYANMAR SIGN VIRAMA between the consonants forming the cluster. Note that the virama is not visible.

Consonant repetition. Where the same consonant appears at the end of a syllable and the beginning of a new syllable in the same word they are commonly represented in the usual cluster form, eg. ပိန္နဲသီး pẽ̀nɛ̀dʰì (jackfruit).

In a few Burmese words, however, a doubled consonant is represented by a single consonant and asat, eg. ယောက်ျား yaʊʔcà (man, husband) and ကျွန်ုပ် (first person singular). Note how this produces a situation where an asat is used between a consonant and a medial or vowel sign. Refs: Hoskens

A repeated can be represented using U+103F MYANMAR LETTER GREAT SA . In modern Burmese, appears within words, whereas သ်သ is used across word boundaries. Refs: Uniprop, 3

Medial consonants. YA, WA, RA, and HA have special variant forms when used medially as modifiers of the syllable's vowel. They combine with the preceding characters, ie. ချက် cʰɛʔ (cook), ကြက် cɛʔ (chicken), နွား nwà (cow), and မှာ mʰa (in, at). It is also possible to find two medials associated with a consonant, eg. လျှ lʰjá or ʃá and မြွေ mwe (snake).

Dedicated medial signs exist in Unicode 5.1 for each of these uses. They are combining characters.

Note that Pali and Sanskrit texts written in the Myanmar script, as well as in older orthographies of Burmese, sometimes render the consonants YA, RA, WA and HA in subjoined form. In those cases, U+1039 MYANMAR SIGN VIRAMA and the regular form of the consonant are used. Refs: Unicode

The medial HA is used to create aspirated versions of consonants, and also to create the sound ʃ. The latter is represented by either ရှ or လျှ (see the example above), depending on the word, eg. ရှိတယ် ʃídɛ (to have).

Kinzi. When the first consonant in a consonant cluster is a non-word-final it rises over the following letter and keeps its virama, rather than pushing the following consonant below it, eg. အင်္ဂလန် ɪ̃gəlã (England). This is called 'kinzi'. To achieve this, use the sequence U+1004 MYANMAR LETTER NGA, U+103A MYANMAR SIGN ASAT and U+1039 MYANMAR SIGN VIRAMA, then continue with the next letter.

Changes in Unicode 5.1. In Unicode 5.0, U+103A MYANMAR SIGN ASAT did not exist, and U+1039 MYANMAR SIGN VIRAMA had to be used for both visible and non-visible viramas. This approach was problematic in that, since there are no spaces between words, it is not easy to automatically ascertain whether a virama should appear above a consonant or cause the stacking effect. For example, should my sequence of characters appear like this, အမ်မီတာ , or like this အမ္မီတာ? To get around this in Unicode 5.0 you needed to use a zero-width non-joiner (ZWNJ) after the virama if you want it to remain visible (ie. the first example above would have been transcribed as øm̸ˣʲmītā and the second as øm̸mītā). The non-joiner prevents stacking. In practice, this meant that there were very many ZWNJ characters in Burmese text, since there are many syllable-final consonants needing ASAT, and typing Myanmar was therefore much more time-consuming than it needed to be.

Unicode 5.1 also introduced dedicated medial consonants. This makes it easier to type Myanmar text, but also allows for easy distinction of subjoined variants of these consonants rather than the usual medial forms.

One or two other characters were introduced, such as the TALL AA (described below).

Aspirated consonants. Burmese aspirates many consonants. In some cases these are separate characters, in other cases the aspiration is indicated using 103E: MYANMAR CONSONANT SIGN MEDIAL HA. Aspirated sounds include the following, where the last six use MEDIAL HA: Refs: Mesher, 12

Voicing. Unvoiced syllable initial consonants are typically pronounced with voicing when they appear in non-initial syllables of a word or in particle suffixes, unless they follow a syllable with stopped tone or follow the prefix. Aspirated consonants lose their aspiration at the same time. For example, သတင်းစာ (farmer) is pronounced θədĩ̀za not θətĩ̀sa. Because of the rule about the stopped tone (ie. a syllable ending in a plosive consonant), however, တစ်ဆယ် (ten) is pronounced təsʰɛ not təzɛ.

Note that care needs to be taken with compound words, since they contain more than one word-initial syllable, eg. နားထောင် (listen) is pronounced nàtʰaʊ̃not nàdaʊ̃. Refs: Mesher, 175-176

There is also an irregular pattern of voicing initial consonants, particularly with place names. Mesher provides examples of words beginning with စ ပ တ and , eg. စေတီ (table) is pronounced zedi not sedi; ပုဂံ (Pagan/Bagan) is pronounced bəgã not pəgã; ထားဝယ် (Tavoy/Dawei) is pronounced dəwe not tʰəwe. Refs: Mesher, 251

Other phonetic information. The combination of velar stop and RA or YA are pronounced as c, eg. ကြက် cɛʔ, ကျပ် caʔ.

Some conventions exist for representing foreign sounds. f is (usually ), v is (usually b) or ဗွ (usually bw), eg. တီဗွီ tivi. A foriegn syllable final sound can be rendered by placing a second killed consonant after the syllable, sometimes in parentheses, eg. ဘတ်(စ်) bas (bus).

Vowels and tones

The Unicode 5.1 Myanmar block groups vowel characters into 8 independent vowels and 7 dependent vowels.

Inherent vowel. The inherent vowel is a. Most of the time, Mesher treats this phonetically as ə.

The independent form, used for syllable initial position, is represented using , eg. အတန်း ətã̀ (class). Note that this is classed as a consonant rather than a vowel by the Burmese, and is actually a vowel carrier with the inherent vowel.

Independent/initial vowels. The consonant is used as a support for vowel signs, and the combination of that and the vowel sign is the normal native way of showing independent/initial vowels, eg. အိတ် eɪʔ (bag).

Some independent/initial vowels have an alternative form that is used in some words only - typically Indian loan words, eg. ဧရာဝတီ ejawádi (Irawaddy river), ဩဂုတ် ɔ̀goʊʔ (August), and i (this). There are normally different forms for specific tones, and normally only one or two vowel+tone combinations have these forms.

Long vs. short vowels. The 'primary' vowels have 'short' and 'long' written forms that hark back to the earlier Indic script origins, but the distinction is used nowadays for indicating different tones.

Vowel sign placement. Vowel signs appear above, below, or to the left or right of the base consonant. There are also vowel sign combinations that appear both top and bottom, and left and right.

A consonant cluster is treated as a unit when it comes to vowel-signs, for example အငွေ øṅw̱e, where the left-combining E is displayed to the left of the NGA although the character appears after the WA in memory.

On the other hand, vowel signs that would normally appear below a consonant are normally displayed to the right if something else intrudes on that space, such as a stacked consonant eg. စက္ကူ sɛʔku (paper), or a medial consonant eg. အဖြူ əpʰju (white), or a consonant with a 'descender' eg. အညို əɲoʊ (brown).

Contextual shape changes. In order to avoid visual confusion, there are two forms of the long -aa vowel sign in Burmese. ဝာ wa is hard to distinguish from ta, so a taller form of AA is used, ie. ဝါ. This form, whether alone or as part of a complex vowel, is used after the following consonants: ခ ဂ င ဒ ပ ဝ, eg. ပေါင် paʊ̃ (thigh). Where there is no ambiguity, however, the normal shape is used, eg. ပြောင်းဖူး pjaʊ̃̀bù (corn).

Whereas in Unicode 5.0 the choice of appropriate form was left to the font or implementation during rendering, such contextual decisions are not appropriate for Sgaw Karen and other minority scripts, which only use the tall form, so U+102B MYANMAR VOWEL SIGN TALL AA was added to Unicode 5.1 as a separate character. Refs: Unicode

As mentioned earlier, there are also special long forms of MYANMAR VOWEL SIGN U and MYANMAR VOWEL SIGN UU when there is not enough room for them below a cluster. These forms need to be produced by the font, since there are no special characters for them.

Tones. There are four tones in Burmese, creaky, low, high and stopped. A vowel plus tone combination is called a rhyme. The tone of a syllable can be indicated by the vowel used, or by a combination of vowel and diacritic. The stopped tone only, but always, occurs where a syllable ends in a stop consonant.

Diagram of tones.Refs: Meshner 7

Vowels in open syllables. There are 7 main vowel sounds in open syllables. The following lists those sounds and their different representations for the three tones in Burmese, creaky, low and high, that apply to open syllables. (Combining symbols are shown with , and alternate independent forms are shown in parentheses.)

Description creaky low high example
a Primary central inherent အာ အား လာ la (come)
i Primary front အိ () အီ () အီး မီး (fire)
u Primary back အု () အူ () အူး တူ tu (chopsticks)
e High front mid အေ့ () အေ အေး နှေး nʰè (slow)
o High back mid အို့ အို အိုး ဆိုး sʰò (bad)
ɛ Low front mid အဲ့ အယ် အဲ ဘယ် (which)
ɔ Low back mid အော့ အော် () အော () ပျော် pjɔ (happy)

The following table summarises the above in a way that allows you to see how the various tones are applied to open syllables using the native Myanmar characters. Where long vs. short forms exist, for the purposes of clarity in the table, the long form is taken here to be the standard form and the short form a variant.

creaky low high
a inherent vowel no mark visarga
i short form no mark visarga
u short form no mark visarga
e dot below no mark visarga
o dot below no mark visarga
ɛ dot below killed-y form no mark
ɔ dot below asat no mark

Vowels in closed syllables. Vowels in 'closed' syllables end in a glottal stop or nasalisation. Historically, however, they ended in one of four nasals or four stops, and this is still reflected in the orthography. The vowel quality has also evolved in these syllables, typically producing diphthongs.

To indicate that the consonant is syllable-final, an asat is placed over it.

The sound values of vowel signs used in open and closed syllables differs systematically as follows.

i becomes , eg. အိန် ʔeɪ̃; အိတ် ʔeɪʔ.

u becomes , eg.အုန် ʔoʊ̃; အုတ် ʔoʊʔ.

ɔ becomes aʊ, eg. အောင် ʔaʊ̃; အောက် ʔaʊʔ.

o becomes , eg. အိုင် ʔaɪ̃; အိုက် ʔaɪʔ.

The inherent a is a lot more complicated, becoming one of ɪ, e, a, or ɛ.

The list of most common sounds are given below. There are other combinations of vowel and final consonant found in Burmese words of Indian origin, which often stick to the original Indian spelling, however, they tend to follow Burmese pronunciation, eg. ဓာတ် daʔ, ဗိုလ် bo, ဥယ္ယာဉ် úyĩ.

Vowels in closed syllables ending in nasals. The following table lists the main sounds in Burmese where the syllable ends in a nasal.

Example
ã     အန် အမ် ပန်း pã̀ (flower)
ĩ အင်       ဝင် wɪ̃ (enter)
ɛ   အည်      
ũ     အွန်   ဇွန်း zũ̀ (spoon)
    အိန် အိမ် အိမ် eĩ̀ (house)
oʊ̃     အုန် အုမ် ရန်ကုန် yãkoʊ̃ (Rangoon)
aʊ̃ အောင်       ကောင်း kaʊ̃̀ (good)
aɪ̃ အိုင်       ဆိုင် sʰaɪ̃ (store)

Note how အည် doesn't end in a nasalisation. There is another consonant, , which has come to be used to produce nasalisation.

These syllables are by default low in tone, but creaky and high tones can be indicated using  -့ and  -း in a very regular way. Note that the tone mark appears at the end of the syllable, not immediately after the vowel, eg. အုန့် and ကောင်း.

Vowels in closed syllables ending in stops. The following table lists the main sounds in Burmese where the syllable ends in a stop.

က Example
    အတ် အပ် ဖတ် pʰaʔ (read)
  အစ်     နှစ် nʰiʔ (year)
ɛʔ အက်       ကြက် ceʔ (chicken)
ũ     အွတ်   လွတ်လပ် luʔlaʔ (independent)
eiʔ     အိတ် အိပ် အရိပ် ayeiʔ (shadow)
oʊʔ     အုတ် အုပ် စာအုပ် saoʊʔ (book)
aʊʔ အောက်       နောက် naʊʔ (next)
aɪʔ အိုက်       လိုက် laɪʔ (follow)

These syllables are all unmarked 4th (stopped) tone.

Vocalic weakening. A process called vocalic weakening affects the first syllables of certain words (mostly nouns and adverbs), eg. ထမင် is pronounced tʰəmĩ́, not tʰámí̃; ဘုရား is pronounced pʰəyà, not pʰúyà.

Codepoint order

The following table shows the order in which characters should be typed and stored in memory for a given syllable, per the description in the Unicode Standard. (It is Burmese-specific and doesn't reflect the order or characters needed for languages such as Karen, Mon, Shan, etc.) Refs: Unicode

kinzi U+1004  + ် U+103A  + ္ U+1039
consonants/vowels [ က U+1000 .. U+1021 | U+1023 .. U+1027 | U+1029 | U+102A | U+103F | U+104E ]
subscript consonant  ္ U+1039 + [ က U+1000 .. U+1008 | U+100A .. U+1019 | U+101B | U+101C | U+101E | U+1020 | U+1021 ]
asat sign   ် U+103A
medial ya*  ျ U+103B (+  ် U+103A)
medial ra  ြ  U+103C
medial wa  ွ U+103D
medial ha  ှ U+103E
vowel sign e  ေ U+1031
vowel sign i, ii, ai [  ိ U+102D |   ီ U+102E |   ဲ U+1032]
vowel sign u, uu [ ု U+102F |  ူ U+1030]
vowel sign tall aa, aa* [ ါ U+102B |  ာ U+102C] (+  ် U+103A)
anusvara  ံ U+1036
dot below  ့ U+1037
visarga  း U+1038

Characters with an asterisk are potentially followed by an asat sign.

Unfortunately, normalization may result in a different order. In particular, U+103A MYANMAR SIGN ASAT occurs after U+1037 MYANMAR SIGN DOT BELOW in normalized text. Applications such as fonts should still handle this alternative order, since the sequences are canonically equivalent.

Notes on shaping

The following is a selection of examples of situations where OpenType or similar font features are needed to produce Burmese text as expected. It is not an exhaustive list.

Glyphs for subscripted consonants tend to be smaller than their full forms, eg. သဒ္ဒါ θəda (grammar), and may be rotated, eg. က္ဌ .

The shape of MEDIAL RA changes according to what it surrounds, eg. compare the two different widths in the word ကြက်သွန်ဖြူ cɛʔθũbju (garlic) and shortening at the top right of ဝန်ကြီး wũcì (minister). The joining behaviour of MEDIAL YA also differs, eg. ချက် cɛʔ (cook) vs ကျွန်မတို့ cəmádó (female we).

The ASAT varies its position and shape according to context, eg. လမ်း lã̀ (road), but ဒေါ်လေး dɔlè (aunt), and ရွှေပဲသီး ʃwebɛ̀ (snow peas).

The shape of NA changes when something appears below it, eg.နို့နဲ့ nónɛ́ (with milk). Similarly, the bottom of NYA also changes in the following context, ပဉ္စမ pjɪ̃zəmá́ (fifth).

The placement of the DOT BELOW, used as a tone mark, varies slightly according to context, eg. ပြီးခဲ့တဲ့ pìgɛ́dɛ́ (last, ago) and တချို့ təcʰoʊ (some), as does that of MEDIAL HA, eg. it is smaller than usual in ကောက်ညှင်း kaʊʔnʰjɪ̃̀ (sticky rice).

Other examples noted above include the change of shape and position of VOWEL SIGN U and VOWEL SIGN UU when other items appear below the base consonant, and the production of the kinzi.

Text layout

Word boundaries, punctuation, etc.

Spaces are used to separate phrases, rather than words. Words can be separated with ZWSP to allow for easy wrapping of text.

The Unicode 7.0 Myanmar block lists 2 punctuation characters. Punctuation is commonly limited to and , with significance close to comma and full stop, respectively.

List of basic symbols

Burmese

This is a list of main characters or character combinations needed for Burmese. Clicking on these characters will open a page in another window. If the character is underlined, the new page will display additional information about that character.

 

Basic consonants က
Other consonants
Medial consonants   ျ   ြ   ွ   ှ
Vowel signs   ါ   ာ   ိ   ီ   ု   ူ   ေ   ဲ
Compound vowels   ို   ော
Independent vowels
Combining marks   ်   ံ   ့   း   င်္ ္
Symbols & punctuation
Numbers

 

To see a list of ligatures and alternative shapes go to the 'shape' view of the Myanmar character picker. (Hint: to see the composition of a conjunct, click on it and select 'Codepoints'.)

Further reading

  1. [Daniels] Peter T. Daniels and William Bright, The World's Writing Systems, Oxford University Press, ISBN 0-19-507993-0
  2. [Hosken] Martin Hosken & Maung Tuntunlwin, Unicode Technical Note #11, Representing Myanmar in Unicode: Details and Examples
  3. [Unicode] The Unicode Standard v5.1, Myanmar.
  4. [Mesher] Gene Mesher, Burmese for Beginners , ISBN 1-887521-51-8
  5. [Uniprop] Ireland (NSAI), United Kingdom (BSI), Myanmar Language Commission, Myanmar Unicode and Natural Language Processing Research Center, Myanmar Computer Federation, Proposal to encode seven additional Myanmar characters in the UCS (restricted access URI)

Available at: rishida.net/scripts/myanmar/.

Content first published 7 April, 2006. This version 2014-10-13 10:55 GMT