Accesskey n skips to in-page navigation. Skip to the content start
These notes are still in development. I am using them to explore the Arabic script as used for Urdu.
This page sets out to list the symbols used to represent Urdu text, describe their use, and relate them to appropriate characters for representation in Unicode. Along the way I also describe the basic phonology associated with the graphical symbols.
In some cases there is some discussion about which Unicode characters are most appropriate, and it was to address these questions that I originally embarked on this.
You need a font that supports nastaliq style for all Urdu characters. See the sidebar for links to the fonts I used. Alternatively you can view the PDF version of this document.
Urdu uses the Arabic script with extensions. A number of the extensions are based on those developed for Persian (Farsi).
The script type is abjad, ie. the script is largely consonantal and short vowel sounds are typically not shown. Some of the consonant characters double as long vowels (eg. ی and و). The vowels are not usually clearly defined, but when necessary, vowel information can be represented by combining marks appearing above or below the base consonant. The absence of a vowel and doubling of consonants can be indicated in the same way.
The basic alphabet covers a much wider repertoire of sounds than found in Arabic, so several extensions have been added to the basic Arabic script. Many of these come via Persian. The alphabet includes aspirated letters that have to be composed with two Unicode characters and a je letter that uses different Unicode characters depending on the context. The alphabet is as follows:
ا ب بھ پ پھ ت تھ ٹ ٹھ ث ج جھ چ چھ ح خ د دھ ڈ ڈھ ر ڑ ڑھ ز ژ س ش ص ض ط ظ ع غ ف ق ک کھ گ گھ ل م ن و ہ ی/ے
Some other characters are used that are not generally counted as part of the alphabet. These are:
In addition, there are:
punctuation characters: ، ؟ ۔
digits: ۰ ۱ ۲ ۳ ۴ ۵ ۶ ۷ ۸ ۹
and various signs & symbols:

Although it is not always possible to guess the vowel sounds in a word, the consonants are largely reliable phonetically. There is mostly a one-to-one correspondance between letters and sounds.
Since the script is cursive (ie. letters are typically joined) the letter forms can vary considerably according to position.
Urdu is typically written in a nasta'liq style; ie. the connected letters in a word tend to follow a sloping baseline. This is achieved in Unicode by using the correct font - the underlying characters used are not different for nasta'liq vs. other styles.
The absence of a vowel sound can be indicated with a diacritic called sukuːn or ʤazm, although this diacritic is not normally shown in text, eg. سَخْت [saxt] hard.
It has various possible forms, including a small round circle, something that looks like peʃ, and something like a circumflex.
This diacritic is never written above the final character in a word, because as a rule a short vowel is not pronounced in this position.
Consonant sounds can be lengthened. In vowelled text, which is very rare, this is shown using a diacritic called taʃdiːd, eg. ستّر [sattar], seventy. More often than not, this is not written.
There are 10 vowel sounds, though there are also allophonic variants. They are usually grouped into pairs of 'short' and 'long' sounds - although the difference is qualitative, rather than just length. The basic phonemes are as follows:
| ə* | ɪ | ʊ | ɛ | ɔ |
| ɑː | iː | uː | e | o |
The following table shows the standard ways of indicating vowel sounds when diacritics are used. Note however, that context can change the value of a vowel diacritic (such as a following ain or he) - these are detailed below the table. Three short vowels are not typically found in final position. The examples only show diacritics for the sound currently being discussed.
| sound | final | medial | initial | base component | final | medial | initial |
|---|---|---|---|---|---|---|---|
| ə | ![]() |
zabar | بَب [bəb] | اَب [əb] | |||
| ɪ | ![]() |
zer | دِن [dɪn] | اِن [ɪn] | |||
| ʊ | ![]() |
peʃ | سُست [sʊst] | اُس [ʊs] |
|||
| ɑː | ![]() |
![]() |
![]() |
alɪf | لکھنا [lɪkʰnɑː] |
باغ [bɑːɣ] |
آج [ɑːʤ] |
| e | ![]() |
![]() |
![]() |
je | بجے [baʤe] | بیٹا [beʈɑː] | ایک [ek] |
| iː | ![]() |
![]() |
![]() |
zer+je / je | گاری [gɑːriː] | تِین [tiːn] | اِینٹ [iːnʈ] |
| ɛ | ![]() |
![]() |
![]() |
zabar+je | ہَے [hɛ] | کَیسا [kɛsɑː] | اَیسا [ɛsɑː] |
| o | ![]() |
![]() |
![]() |
vɑːuː | کو [ko] | ٹوپی [ʈopiː] | اوس [os] |
| uː | ![]() ![]() |
![]() ![]() |
![]() ![]() |
peʃ+vɑːuː or vɑːuː+inverted peʃ |
ہندُو [hɪnduː] ہندوٗ [hɪnduː] |
پُورا [puːrɑː] ثوٗرا [puːrɑː] |
اُوپر [uːpar] اوٗپر [uːpar] |
| ɔ | ![]() |
![]() |
![]() |
zabar+vɑːuː | نَو [nɔ] | شَوق [ʃɔq] | اَور [ɔr] |
ain The letter ع is used in words of Arabic origin. In these words it can fulfill the same function as the alɪf that provides a vowel with support at the beginning of an Urdu word, eg. عَرب [arab] Arab. The Urdu word اَرَب [arab] necessity, though pronounced the same, becomes a completely different word by its spelling. Note, in particular, that the equivalent of آ (alɪf+madː) [ɑː] is عا, as in عادت [ɑːdat] habit.
A following ع may also affect a short vowel diacritic to produce a long vowel sound as follows:
[ɑː] from zabar followed by ain, eg. بَعد [bɑːd] after
[e] from zer followed by ain, eg. شِعر [siːr] verse
[o] from peʃ followed by ain, eg. شُعلہ [ʃolɑː] flame
ʧʰoʈiː he and baɽiː he The letters ہ and ح can also modify preceding short vowels as follows:
[ɛ] from zabar followed by he, eg. اَحمد [ɛhmad] Ahmed, رَہنا [rɛhnɑː] to remain
[ɛ] from zer followed by he, eg. مِہربانی [mɛhrbɑːniː] kindness, and واضِح [vɑːzɛh] clear
[o] from peʃ followed by ain, eg. شُہرت [ʃɔhrat] fame, and توجُّہ [tavajːɔh] attention
The so-called 'silent' he that appears at the end of many words of Arabic or Persian derivation is pronounced [ɑː], مکَہ [makːɑː] Mecca.
Vowels may be nasalised, like at the end of the French word élan. This is indicated in Urdu by a glyph called nuːn ɣunːa that looks like the letter nuːn except that in word final position it has no dot, eg. ماں [mãː], mother, ٹاںگ [tãːg] leg, and کروں [karũː], I may do.
A hamzā plays more than one role in Urdu. One such role is to indicate the boundaries between vowel sounds when there is no intervening consonant. Depending on the vowels concerned, it is used in a number of different ways. It can also have two different shapes, one like the initial form of 'ain and the other more like an italic 's'.
In this example we see hamza in its isolated form, انشاءﷲ [ɪnʃalːaː] God willing.
When the second vowel is an iː or e represented by ی or ے, the hamzā 'sits on a chair' before the letter representing the second vowel, eg. کئی kaiː several; تیئیس teiːs twenty-three; کوئی koiː someone; گئے gae they went; گائے gɑːe they sang.
The short vowel ɪ as a second vowel is also represented by hamzā 'on its chair', eg. کوئلہ koɪlɑː coal; لائن lɑːɪn queue.
To represent hamzā 'on a chair' for initial or medial positions with the Nafees Nastaleeq script you can use 0626: ARABIC LETTER YEH WITH HAMZA ABOVE. This is not an ideal solution, however, since the hamza is sitting on a je that is not actually appropriate. This becomes particularly problematic when decomposed for normalization. It seems that in order to resolve this issue a new Unicode character will be needed.
When the second vowel is an uː or o represented by و, the hamzā typically sits directly on top of the و, eg. آؤ ɑːo come; جاؤں ʤɑːũː I may go. Note that often the hamzā is omitted in this situation. To represent this in Unicode use 0624: ARABIC LETTER WAW WITH HAMZA ABOVE.
Many words have the vowel combinations iːɑ̃ iːe iːo, where hamzā is not typically used, eg. لڑکیاں laɽkiːɑ̃ː girls; چلیے ʧaliːe come on; لڑکیوں کا laɽkiːõ kɑː of the girls.
Hamzā is also used to represent izāfat when the preceding word ends in either choṭī he or ye (see below).
Izāfat [ɪzɑːfat] is the name given to the short vowel ɛ used to describe a relationship between two words. It may be translated of, eg. as in the Lion of Punjab.
This sound is mostly represented using zer. Sometimes, however, the combining mark is not shown, even though pronounced. Examples: شیرِ پنجاب ʃer ɛ panʤɑːb Lion of the Panjab; طالبِ علم tɑːlɪb ɛ ɪlm seeker of knowledge (a student).
Izāfat is represented by a combining hamzā when the preceding word ends in either choṭī he ہ or ye ی: eg. قطرۂآب qatrah ɛ ɑːb drop of water; ولئکام valiː ɛ kɑːmɪl perfect saint.
izāfat may also be shown as ے with or without a combining hamzā when the preceding word ends in a long vowel: eg. صدا ۓ بلند sadɑː ɛ buland a high voice; رو ۓ زمین ruː ɛ zamiːn the surface of the ground.
The pronunciation of ال (alif followed by lām) varies when it represents the Arabic definite article . This affects many words in Urdu that have come from Arabic, in particular names and adverbial expressions.
The lām is not pronounced if it precedes one of the following characters: ت062A te, ث062B se, د062F dāl, ذ0630 zāl, ر0631 re, ز0632 ze, س0633 sīn, ش0634 šīn, ص0635 svād, ض0636 zvād, ط0637 toe, ظ0638 zoe, ل0644 lām, ن0646 nūn. Instead, the following sound is doubled. A tašdīd may sometimes be used to indicate this. Example: السلام علیکم [asːalɑːm alaikum] greetings.
Often the alif is not pronounced after a short preceding word that ends in a vowel. If the preceding vowel was long, it is shortened in this process. Examples: بالکل [bɪlkul] absolutely; فی الحال [filhɑːl] at present.
Often the vowel is pronounced [ʊ], eg. دارالحکومت [dɑːrʊlhʊkuːmat] capital.
0627: ARABIC LETTER ALEF
Urdu Called [alɪf].
This letter of the alphabet represents one of several possible vowels.
In word initial position, alone: a ɪ u
Word initial, followed by ye, ای: iː e ɛ
Word initial, followed by vāū, او: uː o ɔ
With madd combining mark, آ: ɑː (see 0622 ARABIC LETTER ALEF WITH MADDA ABOVE)
Elsewhere: ɑː, unless part of the Arabic definite article (see below).
The alternative sounds possible in the initial combinations can be disambiguated, when necessary, by the use of combining marks. The combining marks are rarely used in normal text (with the exception of madd shown above). See a table of combining marks for vowels.
Arabic definite article The pronunciation of ال (alif followed by lām) varies when it represents the Arabic definite article . This affects many words in Urdu that have come from Arabic, in particular names and adverbial expressions.
Often the alif is not pronounced after a short preceding word that ends in a vowel. If the preceding vowel was long, it is shortened in this process. Examples: بالکل [bɪlkul] absolutely; فی الحال [filhɑːl] at present.
Often the vowel is pronounced [ʊ], eg. دارالحکومت [dɑːrʊlhʊkuːmat] capital.
The lam may also not be pronounced. See 0644 ARABIC LETTER LAM.
0628: ARABIC LETTER BEH
Urdu [b] Called [be].
Nastaliq joining forms: ببب
This character is also followed by 06BE: ARABIC LETTER HEH DOACHASHMEE to give بھ. This is a distinct letter of the Urdu alphabet, called [bʰe]. The two letters together represent the aspirated b in Urdu.
067E: ARABIC LETTER PEH
Urdu [p] Called [pe]
Nastaliq forms: پپپ
This character is also followed by 06BE: ARABIC LETTER HEH DOACHASHMEE to give پھ. This is a distinct letter of the Urdu alphabet, called [pʰe]. The two letters together represent the aspirated consonant [pʰ] in Urdu.
062A: ARABIC LETTER TEH
Urdu [t] Called [te].
Nastaliq joining forms: تتت
This character is also followed by 06BE: ARABIC LETTER HEH DOACHASHMEE to give تھ. This is a distinct letter of the Urdu alphabet, called [tʰe]. The two letters together represent the aspirated consonant [tʰ] in Urdu.
0679: ARABIC LETTER TTEH
Urdu [ʈ] Called [ʈe].
Nastaliq forms: ٹٹٹ ٹ
This character is also followed by 06BE: ARABIC LETTER HEH DOACHASHMEE to give ٹھ. This is a distinct letter of the Urdu alphabet, called [ʈʰe]. The two letters together represent the aspirated retroflex consonant [ʈʰ] in Urdu.
062B: ARABIC LETTER THEH
Urdu [s] Called [se]
Nastaliq joining forms: ثثث
In Urdu, this letter only occurs in words of Arabic and Persian origin, and is much less common than س 0633 ARABIC LETTER SEEN, which is also pronounced [s].
062C: ARABIC LETTER JEEM
Urdu [ʤ] Called [ʤiːm]
Nastaliq forms: ججج
This character is also followed by 06BE: ARABIC LETTER HEH DOACHASHMEE to give جھ. This is a distinct letter of the Urdu alphabet, called [ʤʰe]. The two letters together represent the aspirated retroflex consonant [ʤʰ] in Urdu.
0686: ARABIC LETTER TCHEH
Urdu [ʧ] Called [ʧe].
Nastaliq forms: چچچ چ
This character is also followed by 06BE: ARABIC LETTER HEH DOACHASHMEE to give چھ. This is a distinct letter of the Urdu alphabet, called [ʧʰe]. The two letters together represent the aspirated consonant [ʧʰ] in Urdu.
062D: ARABIC LETTER HAH
Urdu [h] Called [baɽiː he].
Nastaliq forms: ححح ح
062E: ARABIC LETTER KHAH
Urdu [x] Called [xe].
Nastaliq forms: خخخ خ
062F: ARABIC LETTER DAL
Urdu [d] Called [dɑːl].
Nastaliq forms: ـد د
This character is also followed by 06BE: ARABIC LETTER HEH DOACHASHMEE to give دھ. This is a distinct letter of the Urdu alphabet, called [dʰe]. The two letters together represent the aspirated consonant [dʰ] in Urdu.
0688: ARABIC LETTER DDAL
Urdu [ɖ] Called [ɖɑːl].
Nastaliq forms: ـڈ ڈ
This character is also followed by 06BE: ARABIC LETTER HEH DOACHASHMEE to give ڈھ. This is a distinct letter of the Urdu alphabet, called [ɖʰe]. The two letters together represent the aspirated retroflex consonant [ɖʰ] in Urdu.
0631: ARABIC LETTER REH
Urdu [r] (pronounced with a trill) Called re [re]
Nastaliq forms: ـر ر
0691: ARABIC LETTER RREH
Urdu [ɽ] Called [ɽe].
Nastaliq forms: ـڑ ڑ
This character is also followed by 06BE: ARABIC LETTER HEH DOACHASHMEE to give ڑھ. This is a distinct letter of the Urdu alphabet, called [ɽʰe]. The two letters together represent the aspirated retroflex consonant [ɽʰ] in Urdu.
0632: ARABIC LETTER ZAIN
Urdu [z] Called [ze].
Nastaliq forms: ـز ز
0698: ARABIC LETTER JEH
Urdu [ʒ] Called [ʒe].
Nastaliq forms: ـژ ژ
0633: ARABIC LETTER SEEN
Urdu [s] Called sīn [siːn].
Nastaliq forms: سسس س
In Urdu nastiliq text this can have two somewhat different shapes. In addition to the shape shown here, the wavy part of the letter is sometimes a single swash - especially when two sīn characters are written together.
Use the same character for both visual forms. When one or other of the possible shapes is desired, this should be produced by the font.
0634: ARABIC LETTER SHEEN
Urdu [ʃ] Called [ʃiːn].
Nastaliq forms: ششش ش
In Urdu nastiliq text this can have two somewhat different shapes. In addition to the shape shown here, the wavy part of the letter is sometimes a single swash - especially when two šīn characters are written together.
Use the same character for both visual forms. When one or other of the possible shapes is desired, this should be produced by the font.
0635: ARABIC LETTER SAD
Urdu [s] Called svād [svɑːd].
Nastaliq forms: صصص ص
Only used in words of Arabic origin.
0636: ARABIC LETTER DAD
Urdu [z] Called zvād [zvɑːd].
Nastaliq forms: ضضض ض
Only used in words of Arabic origin.
0637: ARABIC LETTER TAH
Urdu [t] Called toe [toe].
Nastaliq forms: ططط ط
Only used in words of Arabic origin.
0638: ARABIC LETTER ZAH
Urdu [z] Called zoe [zoe].
Nastaliq forms: ظظظ ظ
Only used in words of Arabic origin.
0639: ARABIC LETTER AIN
Urdu No sound. Called ain.
Nastaliq forms: ععع ع
No sound, but this letter is preserved in Arabic words in which it occurs.
At beginning of a word it functions like alɪf, carrying a vowel, eg. عرب [arab], Arab. An initial long ɑː, usually represented by alɪf with madː آ, can sometimes be represented by ain followed by alɪf, eg. عادت [ɑːdat] habit.
Note how, although they are pronounced the same, عرب [arab]] Arab, and ارب [arab] necessity, indicate different words.
In non-initial positions ain can cause a change in sound to preceding short vowels, resulting in long vowels, but not always the long form typically associated with a given short form.
ain changes a short [a] to [ɑː], eg. بعد [bɑːd] after.
It changes a short [ɪ] to [e], eg. سعر [ser] verse.
It changes a short [ʊ] to [o], eg. شعلہ [ʃolɑː] flame.
063A: ARABIC LETTER GHAIN
Urdu [ɣ] Called [ɣain].
Nastaliq forms: غغغ غ
0641: ARABIC LETTER FEH
Urdu [f] Called [fe].
Nastaliq forms: ففف ف
0642: ARABIC LETTER QAF
Urdu [q] Called [qɑːf].
Nastaliq forms: ققق ق
06A9: ARABIC LETTER KEHEH
Urdu [k] Called [kɑːf].
Nastaliq forms: ککک ک
When followed by alif or lām, this has a special rounded shape, eg. کا [kɑː], of; کل [kal], yesterday.
This character is also followed by 06BE: ARABIC LETTER HEH DOACHASHMEE to give کھ. This is a distinct letter of the Urdu alphabet, called [kʰe]. The two letters together represent the aspirated consonant [kʰ] in Urdu.
06AF: ARABIC LETTER GAF
Urdu [g] Called [gɑːf].
Nastaliq forms: گگگ گ
When followed by alif or lām, this has a special rounded shape, eg. گام [gɑːm], step; گل [gul], rose.
This character is also followed by 06BE: ARABIC LETTER HEH DOACHASHMEE to give گھ. This is a distinct letter of the Urdu alphabet, called [gʰe]. The two letters together represent the aspirated consonant [gʰ] in Urdu.
0644: ARABIC LETTER LAM
Urdu [l] Called [lɑːm].
Nastaliq forms: للل ل
Combined with a following alif, lām is usually written as لا, eg. گلاس [gilɑːs], glass. Sometimes, however, especially in words of Arabic origin such as the equivalent of the English prefix 'un-', the more Arabic form لا is used, eg. لاعلاج [lɑːʕilɑːʒ], incurable.
Note that I can't find a way to make this example work with a single font. To produce it I had to mix two different fonts!
Arabic definite article The pronunciation of ال (alif followed by lām) varies when it represents the Arabic definite article . This affects many words in Urdu that have come from Arabic, in particular names and adverbial expressions.
The lām is not pronounced if it precedes one of the following characters: ت062A te, ث062B se, د062F dāl, ذ0630 zāl, ر0631 re, ز0632 ze, س0633 sīn, ش0634 šīn, ص0635 svād, ض0636 zvād, ط0637 toe, ظ0638 zoe, ل0644 lām, ن0646 nūn. Instead, the following sound is doubled. A tašdīd may sometimes be used to indicate this. Example: السلام علیکم [asːalɑːm alaikum] greetings.
There may also be effects to the sound of the alif too. See 0627 ARABIC LETTER ALEF.
0645: ARABIC LETTER MEEM
Urdu [m] Called [miːm].
Nastaliq forms: ممم م
0646: ARABIC LETTER NOON
Urdu [n] Called [nuːn].
Nastaliq forms: ننن ن
Used for the nasal consonant, but also used to represent word medial nasalisation of vowels, eg. ٹانگ ʈãːg, leg.
0648: ARABIC LETTER WAW
Urdu [v uː o ɔ] Called vɑːuː.
Nastaliq forms: ـو و
Both a consonant and a vowel.
As a consonant, it is a cross between v and w, eg. والد [vaːlɪd] father, نومبر [navambar] November.
As a vowel, whether word initial after alɪf, او, or elsewhere on its own, it is one of [uː o ɔ].
The alternative vowel sounds can be disambiguated, when necessary, by the use of combining marks. The combining marks are rarely used in normal text. See a table of combining marks for vowels.
In a number of words of Persian origin beginning with خوا it is silent, eg. خواب [xɑːb] dream.
In two very common words it represents the short vowel [ʊ]: خود [xʊd] self, and خوش [xʊʃ] happy.
06C1: ARABIC LETTER HEH GOAL
Urdu [h] Called [ʧʰoʈiː he].
Nastaliq forms: ہہہ ہ
The initial form is written with a hook beneath, eg. ہندو [hinduː] Hindu. The medial can be written with or without, eg. کہاں [xɑːb] dream.
A special initial form is used before alif or lam, eg. ہاں [hãː] yes, and اہل [ahl] people.
Silent he: In Urdu words this letter is pronounced at the end of a word. Many Arabic and Persian words end in a he that is pronounced [ɑː] (just like alif), eg. مکّہ [makːɑː] Mecca.
A word like [rɑːʤɑː], king, can be spelled with either an alif or a he, ie. راجا or راجہ. This is because the original Indian word was borrowed into Persian, then back into Urdu. Both spellings are now acceptable.
Doubled he: In order to distinguish some words where the final h is pronounced rather than representing [ɑː] or [ɛ], the choṭī he is sometimes doubled, eg. کہہ [kɛh] say vs. کہ [kɛ].
Aspiration: Until recently choṭī he ہ and do cašmī he ھ could be used interchangeably, eg. ہاں or ھاں for [hãː] yes. Modern practice is to use the latter exclusively for aspiration, though people do still occasionally confuse the two.
Vowels: ʧʰoʈiː he can convert the preceding vowel from what would otherwise have been [a], [ɪ] and [ʊ] to [ɛ] and [o] , eg. اَحمد [ɛhmad] Ahmed, رَہنا [rɛhnɑː] to remain .
06CC: ARABIC LETTER FARSI YEH
Urdu [iː e ɛ j] Called je.
Nastaliq forms: یی ی
The Urdu letter je has two distinct visual forms requiring the use of two Unicode characters: this one ی and ے 06D2: ARABIC LETTER YEH BARREE.
The letter je represents both a consonant and a vowel.
At the beginning and middle of a word the form ی usually represents a consonant, eg. یار [jɑːr], friend and سایہ [sɑːjɑː], shadow.
As a vowel, the form ی is used in word initial position after alɪf ای, and in medial position on its own, to represent [iː e ɛ], eg. ایک [ek] one , سینہ [siːnɑː] breast, and کیسا [kɛsɑː] how.
To represent the vowels [e] or [ɛ] in final position the form ے is used, eg. لڑکے [laɽke] boys.
In word final position the vowel form ی represents only [iː], eg. لڑکی [laɽkiː] girl.
Also, in isolated form ی represents [iː], whereas ے stands for [e] or [ɛ].
The alternative vowel sounds can be disambiguated, when necessary, by the use of combining marks. The combining marks are rarely used in normal text. See a table of combining marks for vowels.
The baɽiː je form is also used to represent izāfat. See 06D2 ARABIC LETTER YEH BARREE.
06D2: ARABIC LETTER YEH BARREE
Urdu [e ɛ] Called [baɽiː je].
Nastaliq forms: ـے ے
The Urdu letter [je] has two distinct visual forms requiring the use of two Unicode characters. This one ے and ی 06CC ARABIC LETTER FARSI YEH.
The letter [je] represents both a consonant and a vowel, but this form is used only for vowels.
At the beginning and middle of a word the form ی usually represents a consonant, eg. یار [jɑːr], friend and سایہ [sɑːjɑː], shadow.
As a vowel, the form ی is used in word initial position after alif ای, and in medial position on its own, to represent [iː e ɛ], eg. ایک [ek], one and سینہ [siːnɑː], breast.
To represent the vowels [e] or [ɛ] in final position the form ے is used, eg. لڑکے [laɽke], boys.
In word final position the vowel form ی represents only [iː], eg. لڑکی [laɽkiː], girl.
Also, in isolated form ی represents [iː], whereas ے stands for [e] or [ɛ].
The alternative sounds possible in the initial combinations can be disambiguated, when necessary, by the use of combining marks. The combining marks are rarely used in normal text. See a table of combining marks for vowels
Izāfat: [ɪzɑːfat] is the name given to the short vowel [ɛ] used to describe a relationship between two words. It may be translated of, eg. as in the Lion of Punjab.
This sound is mostly represented using zer, but can also be represented with a combining hamza in a couple of cases.
izāfat may also be shown as ے with or without a combining hamzā when the preceding word ends in a long vowel: eg. صدا ۓ بلند [sadɑː ɛ buland] a high voice; رو ۓ زمین [ruː ɛ zamiːn] the surface of the ground.
There are other ways in which izafat can be formed.
0622: ARABIC LETTER ALEF WITH MADDA ABOVE
Urdu [ɑː] Called madː.
Used in combination with alɪf at the beginning of a word to give the sound [ɑː], eg. آب [ɑːb] now. Unlike the short vowel diacritics, the diacritic madː is never omitted.
alɪf with madː is exceptionally used in non-initial position in the word for Koran, القرآن.
madː means increasing
06BA: ARABIC LETTER NOON GHUNNA
Urdu Nasalisation. Called nuːn ɣunːa.
Nastaliq forms ںںں ں
Indicates that the preceding vowel is nasalised.
At the end of a word, an undotted form is used, eg. ماں [mãː], mother, کروں [karũː], I may do.
Nasalization within a word uses a form with a dot that looks just like the letter 0646 ARABIC LETTER NOON ن, eg. ٹاںگ [tãːg] leg.
This is not counted as a regular letter of the Urdu alphabet.
06BE: ARABIC LETTER HEH DOACHASHMEE
Urdu Called [] (do cašmī he).
Aspiration: This character is used to create the aspirated letters of the Urdu alphabet. Each letter is composed of two characters. The letters are: بھ bʰe, پھ pʰe, تھ tʰe, ٹھ ʈʰe, جھ ʤʰe, چھ ʧʰe, دھ dʰe, ڈھ ɖʰe, ڑھ ɽʰe, کھ kʰe, and گھ gʰe.
Until recently choṭī he 06C1 ARABIC LETTER HEH GOAL ہ and do cašmī he could be used interchangeably to express aspiration, eg. ہاں or ھاں for [hãː] yes. Modern practice is to use this character exclusively for aspiration, though people do still occasionally confuse the two.
Calendar indicator: Dates using the Muslim calendar are followed by the word ہجری [hɪʤriː], which is abbreviated with the symbol ھ.
06C2: ARABIC LETTER HEH GOAL WITH HAMZA ABOVE
Urdu [ɛ]
Izāfat [ɪzɑːfat] is the name given to the short vowel [ɛ] used to describe a relationship between two words. It may be translated of, eg. as in the Lion of Punjab.
This sound is mostly represented using zer, but in certain cases can be represented with a combining hamza. One such case occurs when the preceding word ends in choṭī he ہ: eg. قطرۂآب [qatrah ɛ ɑːb] drop of water.
There are other ways in which izafat can be formed.
06D3: ARABIC LETTER YEH BARREE WITH HAMZA ABOVE
Urdu [ɛ]
[ɪzɑːfat] is the name given to the short vowel [ɛ] used to describe a relationship between two words. It may be translated of, eg. as in the Lion of Punjab.
This sound is mostly represented using zer, but can also be represented with a combining hamza in a couple of cases.
Izāfat may also be shown as ے with or without a combining hamzā when the preceding word ends in a long vowel: eg. صدا ۓ بلن [sadɑː ɛ buland] a high voice; روۓزمین [ruː ɛ zamiːn] the surface of the ground.
There are other ways in which izafat can be formed.
064B: ARABIC FATHATAN
Urdu [an]
This is a doubled zabar. These marks appear at the ends of certain Arabic adverbs. The doubled zabar is the most common of the three marks of this type. Although the mark appears over an alif the vowel sound is short. Examples, یقیناً [yakiːnan] certainly; مثلاً [masalan] for example.
064C: ARABIC DAMMATAN
Urdu [un]
Doubled peš.
064D: ARABIC KASRATAN
Urdu [in]
Doubled zer.
064F: ARABIC DAMMA
Urdu [ʊ uː o ɔ] Called peʃ.
Rarely used; only where pronunciation needs to be spelled out. peʃ means forward.
Above a consonant it typically indicates a following short [ʊ], eg. بُب [bʊb]. At the begining of a word it appears above alɪf or ain, eg. اُب [ʊb].
When the base consonant is followed by certain other letters, peʃ represents different sounds, as shown below:
In combination with a following vɑːuː this represents [uː], eg. پُورا [puːrɑː] full, and اُوپر [uːpar] above.
In combination with a following ain this represents [o], eg. شُعلہ [ʃolɑː] flame, and توُّع [tavaqːo] hope.
A following ʧʰoʈiː he or baṛī he turns this into [ɔ], eg. شُہرت [ʃɔhrat] fame, and توجُّہ [tavajːɔh] attention.
In two very common words, with a following vɑːuː it represents the short vowel [ʊ]: خُود [xʊd] self, and خُوش [xʊʃ] happy.
The word وہ [vo] that, he, she, it is irregular.
See a table of combining marks for vowels.
0650: ARABIC KASRA
Urdu [ɪ iː ɛ e] Called zer.
Rarely used; only where pronunciation needs to be spelled out. zer means below.
Below a base consonant it typically indicates a following short [ɪ], eg. بِب [bɪb]]. At the begining of a word it appears below alɪf or ain, eg. اِتْنَا [ɪtnɑː] so much and عِلْم [ɪlm] knowledge.
When the base consonant is followed by certain other letters, zer represents different sounds, as shown below:
In combination with a following je this represents [iː], eg. سِینہ [siːnɑː] breast, and اِیمان [iːmɑːn] faith.
In combination with a following ain this represents [e], eg. شِعر [ʃer] verse, and واقِع [vɑːqe] situated.
A following ʧʰoʈiː he or baɽiː he turns this into [ɛ], eg. مِہربانی [mɛhrbɑːniː] kindness, and واضِح [vɑːzɛh] clear.
See a table of combining marks for vowels.
ɪzāfat: (pronounced [ɪzɑːfat]) this is the name given to the short vowel [ɛ] when used to describe a relationship between two words. It may be translated of, eg. as in the Lion of Punjab.
This sound is mostly represented using zer. Sometimes, however, the combining mark is not shown, even though pronounced. Examples: شیرِ پنجاب [ʃer ɛ panʤɑːb] Lion of the Panjab; طالبِ علم [tɑːlɪb ɛ ɪlm]] seeker of knowledge (a student).
There are other ways in which ɪzāfat:can be formed.
0651: ARABIC SHADDA
Urdu Called taʃdiːd.
Doubles the sound of the base consonant, eg. ستّر [sattar] seventy. More often than not, this is not written.
taʃdiːd means strengthening.
064E: ARABIC FATHA
Urdu [a ɑː ɛ ɔ] Called zabar.
Rarely used; only where pronunciation needs to be spelled out. zabar means above.
Above a consonant it typically indicates a following short [a], eg. بَب [bab]. At the begining of a word it appears above alɪf or ain, eg. اَب [ab] now, and عَرَب [arab] Arab.
When the base consonant is followed by certain other letters, zabar represents different sounds, as shown below:
In combination with a following alɪf, silent ʧʰoʈiː he, or ain this represents [ɑː], eg. بَاغ [bɑːɣ] garden, مکَہ [makːɑː] Mecca, and بَعد [bɑːd] after.
In combination with a following je (both forms) this represents [ɛ]], eg. جَیسا [ʤɛsɑː] as, اَیسا [ɛsɑː] such, and ہَے [hɛ] is.
A following ʧʰoʈiː he or baɽiː he also turns this into [ɛ], eg. اَحمد [ɛhmad]] Ahmed, and رَہنا [rɛhnɑː] to remain.
In combination with a following vɑːuː this represents [ɔ], eg. شَوق [ʃɔq] keenness, and اَور [ɔr] and.
See a table of combining marks for vowels.
0652: ARABIC SUKUN
Urdu Called sukuːn or ʤazm.
Rarely used; indicates absence of a vowel between consonants, eg. سَخْت [saxt] hard.
It has various possible forms, including a small round circle, something that looks like peʃ, and something like a circumflex. (There is another Unicode character that provides an alternative visual form, 06E1: ARABIC SMALL HIGH DOTLESS HEAD OF KHAH, but it is better to use this character and select the variant required using a font.)
This diacritic is never written above the final character in a word, because as a rule a short vowel is not pronounced in this position.
sukuːn is an Arabic word meaning rest or pause.
0656: ARABIC SUBSCRIPT ALEF
Urdu
Used to indicate a long [iː] vowel, or [i] as contrasted with [e], eg. نُحْیٖ. This diacritic is not usually need, and serves only to emphasise that the vowel is long.
0657: ARABIC INVERTED DAMMA
Urdu
Used to indicate a long [uː] vowel, or [ʊ] as contrasted with [ɔ], eg. حبل حلالہٗ. This diacritic is not usually need, and serves only to emphasise that the vowel is long.
0658: ARABIC MARK NOON GHUNNA
Nasalisation of Urdu vowels is normally indicated by 06BA ARABIC LETTER NOON GHUNNA ں. At the end of a word this has no dot above, but in the middle of a word it looks exactly like 0646 ARABIC LETTER NOON ن (and some people may use this character for this purpose).
This diacritic is used when people want to make it clear that this glyph represents nasalisation rather than the letter nuːn. It is not used in a standard way, just when the user prefers, and is fairly uncommon, eg. ساں٘گ The CRULP fonts don't appear to show the diacritic as expected.
0670: ARABIC LETTER SUPERSCRIPT ALEF
Urdu [ɑː]
Used in a few Arabic words over the final form of 06CC ARABIC LETTER FARSI YEH ی to produce the sound ɑː: eg. اعلیٰ [alɑː] paramount, highest; دعویٰ [davɑː] law suit, claim.
060C: ARABIC COMMA
Urdu Called [].
Looks like ،
060D: ARABIC DATE SEPARATOR
Urdu Separator used in dates between the (numeric) date and the month name, eg. ۴؍صفر ۱۳۰۲ھ.
061F: ARABIC QUESTION MARK
Urdu Called [].
Looks like ؟
066B: ARABIC DECIMAL SEPARATOR
Urdu Called [aʃɑːrɪjɑ].
Looks like ٫
This should look like a hamzā in Urdu, eg. ۲۵۲۴٫۲۳ [do hazɑːr pɑ̃ːʧ sau caubiːs aʃɑːrɪjɑː do tiːn] 2524.23.
06D4: ARABIC FULL STOP
Urdu Called [].
Looks like ۔
0601: ARABIC SIGN SANAH
Urdu Called [sanh].
Gregorian dates are indicated by placing this long sweep below the year digits with the word عیسوی [iːsviː] Christian era. This is usually abbreviated as a hamza ء.
Dates using the Muslim calendar are followed by the word ہجری [hɪʤriː], which is abbreviated with the symbol ھ.
The sanh sign is typed before the digits (in a rtl context): eg. ۲۰۰۴ء 2004. It is not a combining character, even though it displays beneath the digits.
The sanh is derived from the Arabic word for year سنة.
0603: ARABIC SIGN SAFHA
Urdu Called [safah]
Used to indicate a page number, where English would use an abbreviation such as "pp. 35-45", eg. ۴۵. The stroke may be elongated and pass under the number.
The symbol is derived from the stroke used for 0635: ARABIC LETTER SAD.
0600: ARABIC NUMBER SIGN
Urdu Used to indicate the beginning of a number, eg. ۱۲۳. The stroke may be elongated and pass under the number, but this is not a combining character.
0602: ARABIC FOOTNOTE MARKER
Urdu Used to indicate that a number is a footnote, eg. ؎۵. The number usually sits above the symbol, although this is not a combining character however I can't figure out whether it needs to be typed in before or after the number - though I think before. The nafees font doesn't seem to support this.
Do not confuse this with 060E ARABIC POETIC VERSE SIGN.
060F: ARABIC SIGN MISRA
Urdu Called misra
Urdu poetry typically creates poems from couplets. This symbol is used to indicate a single line (misra) of a couplet (shayr) from an Urdu poem, when quoted in text.
This sign is used when quoting a line of verse in text. It is used at the beginning of the line, and is followed by the line of verse. See an example.
0610: ARABIC SIGN SALLALLAHOU ALAYHE WASSALLAM
Urdu Represents sallallahu alayhe wasallam [sallallao alae va sallam] (may God's peace and blessings be upon him) صلّى الله عليه وسلّم. Used over the name of Mohammed.
One of several marks that represent phrases expressing the status of a person, most having specifically religious meaning.
The mark is really associated with a word, rather than a character, but the placement is left to the user. The mark is often added somewhere in the middle of a name, but commonly appears towards the end. This depends to some extent on the letter shapes present and the calligraphic style in use, eg. محمّدؐ [muhamːed sallallao alae va sallam].
0611: ARABIC SIGN ALAYHE ASSALLAM
Urdu Represents alayhe asallam [alejsallam] (upon him be peace) عليه السّلام. Used over the name of prophets other than Mohammed.
One of several marks that represent phrases expressing the status of a person, most having specifically religious meaning.
The mark is really associated with a word, rather than a character, but the placement is left to the user. The mark is often added somewhere in the middle of a name, but commonly appears towards the end. This depends to some extent on the letter shapes present and the calligraphic style in use, eg. عیسؑیٰ [isaː salejsallam] Christ, upon him be peace!.
0612: ARABIC SIGN RAHMATULLAH ALAYHE
Urdu Represents rahmatulla alayhe [raːmatʊlla alee] (may God have mercy upon him) رحمت الله عليه. Used over the names of saints, religious authorities, and other deceased pious persons.
One of several marks that represent phrases expressing the status of a person, most having specifically religious meaning.
The mark is really associated with a word, rather than a character, but the placement is left to the user. The mark is often added somewhere in the middle of a name, but commonly appears towards the end. This depends to some extent on the letter shapes present and the calligraphic style in use, eg. قاضی نور محمّدؒ [kaziː nur mamed rahmatulla alayhe] Qazi Nur Muhammad, may God have mercy upon him!.
0613: ARABIC SIGN RADI ALLAHOU ANHU
Urdu Represents radi allahu 'anhu [raziallaːo ano ] (may God be pleased with him) رضي الله عنه. Used over the names of the Companions of the Prophet.
One of several marks that represent phrases expressing the status of a person, most having specifically religious meaning.
The mark is really associated with a word, rather than a character, but the placement is left to the user. The mark is often added somewhere in the middle of a name, but commonly appears towards the end. This depends to some extent on the letter shapes present and the calligraphic style in use, eg. ابوبکرؓ [abu bakr, raziallaːo ano] Abu Bakr, may God be pleased with him!.
0614: ARABIC SIGN TAKHALLUS
Urdu Sign placed over the name or nom-de-plume of a poet, or in some writings used to mark all proper names.
The mark is really associated with a word, rather than a character, but the placement is left to the user. The mark is often added somewhere in the middle of a name, but commonly appears towards the end. This depends to some extent on the letter shapes present and the calligraphic style in use, eg. عطاشادؔ [ataː ʃaːd] Ata Shad (author's name) There seems to be a problem displaying this with Nafees fonts.
060E: ARABIC POETIC VERSE SIGN
Urdu Often used to mark the beginning of poetic verse. For an example see Figure 8 in Jonathan Kew's examples.
Do not confuse this with 0602 ARABIC FOOTNOTE MARKER.
06F0: EXTENDED ARABIC-INDIC DIGIT ZERO
Urdu Called [sɪfar].
Digit ۰
06F1: EXTENDED ARABIC-INDIC DIGIT ONE
Urdu Called [ek].
Digit ۱