Use accesskey "n" to jump to the internal navigation links at any point. Right now you can
I am compiling these notes as I explore the Arabic script as used for Urdu. They may be updated from time to time.
The page lists the Unicode characters used to represent Urdu text, and briefly describes their use. It starts with short notes on general script features and discussions about which Unicode characters are most appropriate when there is a choice. These notes are still in development.
For more detailed information, especially about the history and phonology of Urdu, follow the links in the text and at the bottom of the page. You can also click on the symbols in the next section to jump to a description of that character.
To view this page as intended, you should download the (free) Nafees Nastaliq font from the Web (see the side bar).
Other characters not generally counted as part of the alphabet:








Urdu uses the Arabic script with extensions. A number of the extensions are based on those developed for Persian (Farsi).
The script type is abjad, ie. the script is largely consonantal and short vowel sounds are typically not shown. Some of the consonant characters double as long vowels (eg. ی and و). The vowels are not usually clearly defined, but when necessary, vowel information can be represented by combining marks appearing above or below the base consonant. The absence of a vowel and doubling of consonants can be indicated in the same way.
The basic alphabet covers a much wider repertoire of sounds than found in Arabic, so several extensions have been added to the basic Arabic script. Many of these come via Persian. The alphabet includes aspirated letters that have to be composed with two Unicode characters and a je letter that uses different Unicode characters depending on the context.
Although it is not always possible to guess the vowel sounds in a word, the consonants are largely reliable phonetically. There is mostly a one-to-one correspondance between letters and sounds.
Since the script is cursive (ie. letters are typically joined) the letter forms can vary considerably according to position.
Urdu is typically written in a nasta'liq style; ie. the connected letters in a word tend to follow a sloping baseline. This is achieved in Unicode by using the correct font - the underlying characters used are not different for nasta'liq vs. other styles.
The absence of a vowel sound can be indicated with a diacritic called sukūn or jazm, although this diacritic is not normally shown in text, eg. سَخْت saxt hard.
It has various possible forms, including a small round circle, something that looks like peʃ, and something like a circumflex.
This diacritic is never written above the final character in a word, because as a rule a short vowel is not pronounced in this position.
Consonant sounds can be lengthened. In vowelled text, which is very rare, this is shown using a diacritic called taʃdiːd, eg. ستّر sattar, seventy. More often than not, this is not written.
There are 10 vowel sounds, though there are also allophonic variants. They are usually grouped into pairs of 'short' and 'long' sounds - although the difference is qualitative, rather than just length. The basic phonemes are as follows:
| ə* | ɪ | ʊ | ɛ | ɔ |
| ɑː | iː | uː | e | o |
The following table shows the standard ways of indicating vowel sounds when diacritics are used. Note however, that context can change the value of a vowel diacritic (such as a following 'ain or he) - these are detailed below the table. Three short vowels are not typically found in final position. The examples only show diacritics for the sound currently being discussed.
| sound | final | medial | initial | base component | final | medial | initial |
|---|---|---|---|---|---|---|---|
| ə | ![]() |
![]() |
zabar | بَب bəb | اَب əb | ||
| ɪ | ![]() |
![]() |
zer | دِن dɪn | اِن ɪn | ||
| ʊ | ![]() |
![]() |
peʃ | سُست sʊst | اُس ʊs |
||
| ɑː | ![]() |
![]() |
![]() |
alɪf | لکھنا lɪkʰnɑː |
باغ bɑːɣ |
آج ɑːʤ |
| e | ![]() |
![]() |
![]() |
je | بجے baʤe | بیٹا beʈɑː | ایک ek |
| iː | ![]() |
![]() |
![]() |
zer+je / je | گاری gɑːriː | تِین tiːn | اِینٹ iːnʈ |
| ɛ | ![]() |
![]() |
![]() |
zabar+je | ہَے hɛ | کَیسا kɛsɑː | اَیسا ɛsɑː |
| o | ![]() |
![]() |
![]() |
vɑːuː | کو ko | ٹوپی ʈopiː | اوس os |
| uː | ![]() ![]() |
![]() ![]() |
![]() ![]() |
peʃ+vɑːuː or vɑːuː+inverted peʃ |
ہندُو hɪnduː ہندوٗ hɪnduː |
پُورا puːrɑː ثوٗرا puːrɑː |
اُوپر uːpar اوٗپر uːpar |
| ɔ | ![]() |
![]() |
![]() |
zabar+vɑːuː | نَو nɔ | شَوق ʃɔq | اَور ɔr |
'ain The letter ع is used in words of Arabic origin. In these words it is typically not pronounced but can support vowels. In this way, at the beginning of a word it can fulfill the same function as the alif, eg. عَرب arab Arab. The Urdu word اَرَب arab necessity, though pronounced the same, becomes a completely different word by its spelling. Note, in particular, that the equivalent of آ (alif+madd) ɑː is عا, as in عادت ɑːdat habit.
A following ع may also affect a short vowel diacritic to produce a long vowel sound as follows:
ɑː from zabar followed by 'ain, eg. بَعد bɑːd after
e from zer followed by 'ain, eg. شِعر siːr verse
o from peʃ followed by 'ain, eg. شُعلہ ʃolɑː flame
choṭī he and baṛī he The letters ہ and ح can also modify preceding short vowels as follows:
ɛ from zabar followed by he, eg. اَحمد ɛhmad Ahmed, رَہنا rɛhnɑː to remain
ɛ from zer followed by he, eg. مِہربانی mɛhrbɑːniː kindness, and واضِح vɑːzɛh clear
o from peʃ followed by 'ain, eg. شُہرت ʃɔhrat fame, and توجُّہ tavajːɔh attention
The so-called 'silent' he that appears at the end of many words of Arabic or Persian derivation is pronounced ɑː, مکَہ makːɑː Mecca.
Vowels may be nasalised, like at the end of the French word élan. This is indicated in Urdu by a glyph called nun ghunna that looks like the letter nun except that in word final position it has no dot, eg. ماں mãː, mother, ٹاںگ tãːg leg, and کروں karũː, I may do. In Unicode there are different characters for each of these uses.
A hamzā plays more than one role in Urdu. One such role is to indicate the boundaries between vowel sounds when there is no intervening consonant. Depending on the vowels concerned, it is used in a number of different ways. It can also have two different shapes, one like the initial form of 'ain and the other more like an italic 's'.
In this example we see hamza in its isolated form, انشاءﷲ ɪnʃalːaː God willing.
When the second vowel is an iː or e represented by ی or ے, the hamzā 'sits on a chair' before the letter representing the second vowel, eg. کئی kaiː several; تیئیس teiːs twenty-three; کوئی koiː someone; گئے gae they went; گائے gɑːe they sang.
The short vowel ɪ as a second vowel is also represented by hamzā 'on its chair', eg. کوئلہ koɪlɑː coal; لائن lɑːɪn queue.
To represent hamzā 'on a chair' for initial or medial positions with the Nafees Nastaleeq script you can use 0626 ARABIC LETTER YEH WITH HAMZA ABOVE.
When the second vowel is an uː or o represented by و, the hamzā typically sits directly on top of the و, eg. آؤ ɑːo come; جاؤں ʤɑːũː I may go. Note that often the hamzā is omitted in this situation. To represent this in Unicode use 0624: ARABIC LETTER WAW WITH HAMZA ABOVE.
Many words have the vowel combinations iːɑ̃ iːe iːo, where hamzā is not typically used, eg. لڑکیاں laɽkiːɑ̃ː girls; چلیے ʧaliːe come on; لڑکیوں کا laɽkiːõ kɑː of the girls.
Hamzā is also used to represent izāfat when the preceding word ends in either choṭī he or ye (see below).
Izāfat ɪzɑːfat is the name given to the short vowel ɛ used to describe a relationship between two words. It may be translated of, eg. as in the Lion of Punjab.
This sound is mostly represented using zer. Sometimes, however, the combining mark is not shown, even though pronounced. Examples: شیرِ پنجاب ʃer ɛ panʤɑːb Lion of the Panjab; طالبِ علم tɑːlɪb ɛ ɪlm seeker of knowledge (a student).
Izāfat is represented by a combining hamzā when the preceding word ends in either choṭī he ہ or ye ی: eg. قطرۂآب qatrah ɛ ɑːb drop of water; ولئکامل valiː ɛ kɑːmɪl perfect saint.
izāfat may also be shown as ے with or without a combining hamzā when the preceding word ends in a long vowel: eg. صدا ۓ بلند sadɑː ɛ buland a high voice; رو ۓ زمین ruː ɛ zamiːn the surface of the ground.
The pronunciation of ال (alif followed by lām) varies when it represents the Arabic definite article . This affects many words in Urdu that have come from Arabic, in particular names and adverbial expressions.
The lām is not pronounced if it precedes one of the following characters: ت062A te, ث062B se, د062F dāl, ذ0630 zāl, ر0631 re, ز0632 ze, س0633 sīn, ش0634 šīn, ص0635 svād, ض0636 zvād, ط0637 toe, ظ0638 zoe, ل0644 lām, ن0646 nūn. Instead, the following sound is doubled. A tašdīd may sometimes be used to indicate this. Example: السلام علیکم asːalɑːm alaikum greetings.
Often the alif is not pronounced after a short preceding word that ends in a vowel. If the preceding vowel was long, it is shortened in this process. Examples: بالکل bɪlkul absolutely; فی الحال filhɑːl at present.
Often the vowel is pronounced ʊ, eg. دارالحکومت dɑːrʊlhʊkuːmat capital.

U+0627 ARABIC LETTER ALEF
Urdu vowel, alif alɪf
a/ɪ/u on its own in word initial position.
iː/e/ɛ word initial, combined with a following ye, ای
uː/o/ɔ word initial, combined with a following vāū, او
ɑː with madd آ, but see 0622 ARABIC LETTER ALEF WITH MADDA ABOVE for this.
ʊ/∅ sometimes as part of the Arabic definite article (see below).
ɑː elsewhere, unless part of the Arabic definite article (see below).
The alternative sounds possible in the initial combinations can be disambiguated, when necessary, by the use of combining marks. The combining marks are rarely used in normal text (with the exception of madd shown above). See a table of combining marks for vowels.
Arabic definite article The pronunciation of ال (alif followed by lām) varies when it represents the Arabic definite article. This affects many words in Urdu that have come from Arabic, in particular names and adverbial expressions.
Often the alif is not pronounced after a short preceding word that ends in a vowel. If the preceding vowel was long, it is shortened in this process. Examples: بالکل bɪlkul (absolutely); فی الحال filhɑːl (at present).
Often the vowel is pronounced ʊ, eg. دارالحکومت dɑːrʊlhʊkuːmat (capital).
(The lam may also not be pronounced. See 0644 ARABIC LETTER LAM.)
Refs: Matthews; Delacy

U+0628 ARABIC LETTER BEH
Urdu consonant, be
b
Looks like: ببب ب
bʰe together with 06BE: ARABIC LETTER HEH DOACHASHMEE, to represent the aspirated b in Urdu, a distinct letter of the Urdu alphabet called bhe.
Looks like: بھبھبھ بھ
Refs: Matthews; Delacy

U+067E ARABIC LETTER PEH
Notes from the Unicode standard:
• Persian, Urdu, ...
Urdu consonant, pe
p
Looks like: پپپ پ
pʰ together with 06BE: ARABIC LETTER HEH DOACHASHMEE, to represent the aspirated p in Urdu, a distinct letter of the alphabet called phe.
Looks like: پھپھپھ پھ.
Refs: Matthews; Delacy

U+062A ARABIC LETTER TEH
Urdu consonant, te
t
Looks like: تتت ت
tʰ together with 06BE: ARABIC LETTER HEH DOACHASHMEE, to represent the aspirated t in Urdu, a distinct letter of the alphabet called the.
Looks like: تھتھتھ تھ.
Refs: Matthews; Delacy

U+0679 ARABIC LETTER TTEH
Notes from the Unicode standard:
• Urdu
Urdu consonant, ṭe ʈe
ʈ
Looks like: ٹٹٹ ٹ
ʈʰ together with 06BE: ARABIC LETTER HEH DOACHASHMEE, to represent the aspirated retroflex t in Urdu, a distinct letter of the alphabet called ṭhe.
Looks like: ٹھٹھٹھ ٹھ.
Refs: Matthews; Delacy

U+062B ARABIC LETTER THEH
Urdu consonant, se se
s Only occurs in words of Arabic and Persian origin, and is much less common than س 0633 ARABIC LETTER SEEN, which is also pronounced s.
Looks like: ثثث ث
Refs: Matthews; Delacy

U+062C ARABIC LETTER JEEM
Urdu consonant, jīm ʤiːm
ʤ
Looks like: ججج ج
ʤʰ together with 06BE: ARABIC LETTER HEH DOACHASHMEE, to represent the aspirated ʤ in Urdu, a distinct letter of the alphabet called jhe.
Looks like: جھجھجھ جھ.
Refs: Matthews; Delacy

U+0686 ARABIC LETTER TCHEH
Notes from the Unicode standard:
• Persian, Urdu, ...
Urdu consonant, ce ʧe
ʧ
Looks like: چچچ چ
ʧʰ together with 06BE: ARABIC LETTER HEH DOACHASHMEE, to represent the aspirated ʧ in Urdu, a distinct letter of the alphabet called che.
Looks like: چھچھچھ چھ.
Refs: Matthews; Delacy

U+062F ARABIC LETTER DAL
Urdu consonant, dāl dɑːl
d
Looks like: ـد د
dʰ together with 06BE: ARABIC LETTER HEH DOACHASHMEE, to represent the aspirated d in Urdu, a distinct letter of the alphabet called dhe.
Looks like: ـدھ دھ.
Refs: Matthews; Delacy

U+0688 ARABIC LETTER DDAL
Notes from the Unicode standard:
• Urdu
Urdu consonant, ḍāl ɖɑːl
ɖ
Looks like: ـڈ ڈ
ɖʰ together with 06BE: ARABIC LETTER HEH DOACHASHMEE, to represent the aspirated retroflex d in Urdu, a distinct letter of the alphabet called ḍhe.
Looks like: ـڈھ ڈھ.
Refs: Matthews; Delacy

U+0691 ARABIC LETTER RREH
Notes from the Unicode standard:
• Urdu
Urdu consonant, ṛe ɽe
ɽ
Looks like: ـڑ ڑ
ɽʰ together with 06BE: ARABIC LETTER HEH DOACHASHMEE, to represent the aspirated retroflex r in Urdu, a distinct letter of the alphabet called ṛhe.
Looks like: ـڑھ ڑھ.
Refs: Matthews; Delacy

U+0633 ARABIC LETTER SEEN
Urdu consonant, sīn siːn
s
Looks like: سسس س In Urdu nastiliq text this can have two somewhat different shapes. The main part of the shape may be a wavy line, a little like a 'w', or can sometimes be a single swash - especially when two sīn characters are written together. Use the same character for both visual forms. When one or other of the possible shapes is desired, this should be produced by the font.
Refs: Matthews; Delacy

U+0634 ARABIC LETTER SHEEN
Urdu consonant, šīn ʃiːn
ʃ
Looks like: ششش ش In Urdu nastiliq text this can have two somewhat different shapes. The main part of the shape may be a wavy line, a little like a 'w', or can sometimes be a single swash - especially when two šīn characters are written together. Use the same character for both visual forms. When one or other of the possible shapes is desired, this should be produced by the font.
Refs: Matthews; Delacy

U+0635 ARABIC LETTER SAD
Urdu consonant, svād svɑːd
s Only used in words of Arabic origin.
Looks like: صصص ص
Refs: Matthews; Delacy

U+0636 ARABIC LETTER DAD
Urdu consonant, zvād zvɑːd
z Only used in words of Arabic origin.
Looks like: ضضض ض
Refs: Matthews; Delacy

U+0637 ARABIC LETTER TAH
Urdu consonant, toe toe
t Only used in words of Arabic origin.
Looks like: ططط ط
Refs: Matthews; Delacy

U+0638 ARABIC LETTER ZAH
Urdu consonant, zoe zoe
z Only used in words of Arabic origin.
Looks like: ظظظ ظ
Refs: Matthews; Delacy

U+0639 ARABIC LETTER AIN
Notes from the Unicode standard:
→ (latin small letter ezh reversed - 01B9)
→ (modifier letter left half ring - 02BF)
Urdu consonant, 'ain ain.
∅ Not pronounced when preserved in Arabic words.
If it occurs at the beginning of a word, it can fulfill a similar role to alif, allowing words to begin with a vowel, but also allowing for alternative spellings for different words with the same pronunciation, eg. عرب arab (Arab) vs. ارب arab (necessity).
Note that a word-initial ɑː sound when the spelling begins with alif is written as alif with madd, eg. آج ɑːʤ (today). The same word-initial sound with 'ain is represented by 'ain followed by alif, eg. عادت ɑːdat (habit).
In non-word-initial positions an ain can cause a change in sound to preceding short vowels. This results in long vowels, but not always the long form typically associated with a given short form.
a short a becomes ɑː, eg. بعد bɑːd (after).
a short ɪ becomes e, eg. سعر ser (verse).
a short ʊ becomes o, eg. شعلہ ʃolɑː (flame).
ʔ occasionally between two vowels, although this is often lost in Urdu, eg. معاف mʊʔɑːf or mɑːf (forgiven); سعآدت səʔɑːdət or sɑːdət (fortunate).
Looks like: ععع ع
Refs: Matthews, pp.xix, xxix; Delacy, pp.89-91

U+063A ARABIC LETTER GHAIN
Urdu consonant, ghain ɣain
ɣ
Used in words that came into Urdu from Arabic and Persian.
Looks like: غغغ غ
Refs: Matthews; Delacy

U+0642 ARABIC LETTER QAF
Urdu consonant, qāf qɑːf
q
Used in words that came into Urdu from Arabic and Persian.
Looks like: ققق ق
Refs: Matthews; Delacy

U+06A9 ARABIC LETTER KEHEH
Notes from the Unicode standard:
• Persian, Urdu, ...
Urdu consonant, kāf kɑːf
k
Looks like: ککک ک
When followed by alif or lām, this has a special rounded shape, eg. کا kɑː (of); کل kal (yesterday).
kʰ together with 06BE: ARABIC LETTER HEH DOACHASHMEE, to represent the aspirated k in Urdu, a distinct letter of the alphabet called khe.
Looks like: کھکھکھ کھ.
Refs: Matthews; Delacy

U+06AF ARABIC LETTER GAF
Notes from the Unicode standard:
• Persian, Urdu, ...
Urdu consonant, gāf gɑːf
g
Looks like: گگگ گ
When followed by alif or lām, this has a special rounded shape, eg. گام gɑːm (step); گل gul (rose).
gʰ together with 06BE: ARABIC LETTER HEH DOACHASHMEE, to represent the aspirated g in Urdu, a distinct letter of the alphabet called ghe.
Looks like: گھگھگھ گھ.
Refs: Matthews; Delacy

U+0644 ARABIC LETTER LAM
Urdu consonant, lām lɑːm
l
∅ when part of the Arabic definite article (see below).
Looks like: للل ل
Combined with a following alif, lām is usually written as لا, eg. گلاس gilɑːs (glass). Sometimes, however, especially in words of Arabic origin such as the equivalent of the English prefix 'un-', the more Arabic form لا is used, eg. لاعلاج lɑːʕilɑːʒ (incurable).
Note that I can't find a way to make this example work with a single font. To produce it I had to mix two different fonts!
Arabic definite article The pronunciation of ال (alif followed by lām) varies when it represents the Arabic definite article . This affects many words in Urdu that have come from Arabic, in particular names and adverbial expressions.
The lām is not pronounced if it precedes one of the following characters: ت062A te, ث062B se, د062F dāl, ذ0630 zāl, ر0631 re, ز0632 ze, س0633 sīn, ش0634 šīn, ص0635 svād, ض0636 zvād, ط0637 toe, ظ0638 zoe, ل0644 lām, ن0646 nūn. Instead, the following sound is doubled. A tašdīd may sometimes be used to indicate this. Example: السلام علیکم asːalɑːm alaikum (greetings).
There may also be effects to the sound of the alif too. See 0627 ARABIC LETTER ALEF.
Refs: Matthews; Delacy

U+0646 ARABIC LETTER NOON
Urdu consonant, nūn nuːn
n
Looks like: ننن ن
Within a word this looks exactly the same as U+06BA ARABIC LETTER NOON GHUNNA, which is used for nasalization of vowels, eg. ٹاںگ ʈɑː̃g (leg).
Refs: Matthews; Delacy

U+0648 ARABIC LETTER WAW
Urdu consonant / vowel, vāū vɑːuː
β as consonant, eg. والد vaːlɪd (father), نومبر navambar (November).
uː or o or ɔ as a vowel, whether word initial after alif, او, or elsewhere on its own, eg. اوپر uːpər (above); لوگ log (people); شوق ʃɔq (keenness). The alternative vowel sounds can be disambiguated, when necessary, by the use of combining marks. The combining marks are rarely used in normal text. See a table of combining marks for vowels.
∅ in a number of words of Persian origin beginning with خوا, eg. خواب xɑːb (dream).
[ʊ] in two very common words: خود xʊd (self), and خوش xʊʃ (happy).
Looks like: ـو و.
Refs: Matthews, pp. xxii-xxiv; Delacy

U+06C1 ARABIC LETTER HEH GOAL
Notes from the Unicode standard:
• Urdu
Urdu consonant, choṭī he ʧʰoʈiː he
h
ɑː as 'silent he' (see below).
ɛ occasionally as a variant of 'silent he' (see below).
∅ when doubled at the end of a word (see below).
Silent he: In Urdu words this letter is pronounced ɑː at the end of a word. Many Arabic and Persian words end in a he that is pronounced ɑː (just like alif), eg. مکّہ məkkɑː (Mecca).
A word like rɑːʤɑː (king), can be spelled with either an alif or a he, ie. راجا or راجہ. This is because the original Indian word was borrowed into Persian, then back into Urdu. Both spellings are now acceptable.
In a few words, the pronunciation of silent he is irregular, eg. کہ kɛ (that) and نہ nə (no).
Doubled he: In order to distinguish some words where the final h is pronounced rather than representing ɑː (or ɛ in irregular pronunciations), the choṭī he is sometimes doubled, eg. کہہ kɛh (say) vs. کہ kɛ.
Aspiration: Until recently choṭī he ہ and do cašmī he ھ could be used interchangeably, eg. ہاں or ھاں for hãː (yes). Modern practice is to use the latter exclusively for aspiration, though people do still occasionally confuse the two.
Vowel changes: choṭī he can change the preceding vowel as follows:
a to ɛ, eg. رَہنا rɛhnɑː (to remain ).
ɪ to ɛ, eg. مہربانی mɛhrbɑːniː (kindness).
ʊ to o, eg. , شہرت ʃohrət (fame).
Looks like: ہہہ ہ
The initial form is written with a hook beneath, eg. ہندو hinduː (Hindu). The medial can be written with or without, eg. کہاں xɑːb (dream).
A special initial form is used before alif or lam, eg. ہاں hãː (yes), and اہل ahl (people).
Refs: Matthews, pp. xxiv-xxvi,xxviii-xxix; Delacy,pp.104-105

U+06CC ARABIC LETTER FARSI YEH
Notes from the Unicode standard:
• Arabic, Persian, Urdu, Kashmiri, ...
• initial and medial forms of this letter have dots
→ (arabic letter alef maksura - 0649)
→ (arabic letter yeh - 064A)
Urdu consonant / vowel, ye je
The Urdu letter je has two distinct visual forms requiring the use of two Unicode characters: this one ی and ے. For more information on the latter, see 06D2 ARABIC LETTER YEH BARREE.
j as a consonant (word initial or medial), یار jɑːr (friend) and سایہ sɑːjɑː (shadow).
iː or e or ɛ as an initial or medial vowel (initially it is used after alif, ای), eg. ایک ek (one), سینہ siːnɑː (breast), and کیسا kɛsɑː (how).
The alternative vowel sounds can be disambiguated, when necessary, by the use of combining marks. The combining marks are rarely used in normal text. See a table of combining marks for vowels.
iː in word final position, eg. لڑکی ləɽkiː (girl).
To represent the vowels e or ɛ in final position or in the isolated form, 06D2 ARABIC LETTER YEH BARREE ے is used, eg. لڑکے ləɽke boys.
Looks like: ییی ی
This character has two dots below it in initial and medial position, but no dots in final or independent form.
Refs: Matthews; Delacy

U+06D2 ARABIC LETTER YEH BARREE
Notes from the Unicode standard:
• Urdu
Urdu vowel, baṛī ye baɽiː je
The Urdu letter je has two distinct visual forms requiring the use of two Unicode characters: this one ے and ی. For more information on the latter, see 06CC ARABIC LETTER FARSI YEH. The latter represents both a consonant and a vowel, but this form is used only for vowels. This form is used only in word final or isolated position.
e or ɛ in word-final or isolated position, eg. لڑکے laɽke, (boys).
The alternative sounds possible in the initial combinations can be disambiguated, when necessary, by the use of combining marks. The combining marks are rarely used in normal text. See a table of combining marks for vowels
Looks like: ـے ے.
This shape is also used with a hamza to represent the izāfat ɪzɑːfat. For this you should use 06D3 ARABIC LETTER YEH BARREE WITH HAMZA ABOVE.
Refs: Matthews; Delacy

U+0621 ARABIC LETTER HAMZA
Notes from the Unicode standard:
→ (modifier letter right half ring - 02BE)
Urdu vowel separator / calendar indicator, hamzā hamzaː
This is the character code for the standalone hamza. The hamza is also used in conjunction with other characters in Urdu, for which there are precomposed characters that can be used. See و 0624 ARABIC LETTER WAW WITH HAMZA ABOVE, ئ 0626 ARABIC LETTER YEH WITH HAMZA ABOVE, ۓ 06D3 ARABIC LETTER YEH BARREE WITH HAMZA ABOVE, and ۂ 06C2 ARABIC LETTER HEH GOAL WITH HAMZA ABOVE. Also ٫ 066B ARABIC DECIMAL SEPARATOR looks like a hamza, but isn't.
A standalone hamza is sometimes used at the end of words derived from Arabic, though it is usually omitted in modern Urdu publications, eg. ضیاء ziaː light, ذکاء zakaː intelligence.
Vowel junctions: The hamzā is used to indicate the boundaries between vowel sounds when there is no intervening consonant. Depending on the vowels concerned, it is used in a number of different ways, usually combined with other characters.
In some cases this standalone form is used, eg. انشاءاللہ ɪnʃallaː God willing.
See other ways in which vowel junctions are formed when the hamza is combined with other characters.
Calendar indicator: Gregorian dates are indicated by placing sahn below the year digits with the word عیسوی iːsviː Christian era. This is usually abbreviated as a hamza, eg. ۲۰۰۴ء.

U+0624 ARABIC LETTER WAW WITH HAMZA ABOVE
Notes from the Unicode standard:
≡ 0648 0654
Urdu vowel separator+vowel
uː or o immediately after a preceding vowel (see below).
Vowel junctions: The hamzā is used to indicate the boundaries between vowel sounds when there is no intervening consonant. Depending on the vowels concerned, it is used in a number of different ways. It can also have two different shapes, one like the initial form of 'ain and the other more like an italic 's'.
When the second vowel is an uː or o represented by و, the hamzā typically sits directly on top of the و, eg. آؤ ɑːo come; جاؤں ʤɑːũː I may go. Often the hamzā is omitted in this situation.
Many words have the vowel combinations iːo, where hamzā is not typically used, eg. لڑکیوں کا laɽkiːõ kɑː of the girls.
See other ways in which vowel junctions are formed when dealing with other combinations of vowels.

U+0626 ARABIC LETTER YEH WITH HAMZA ABOVE
Notes from the Unicode standard:
≡ 064A 0654
Urdu vowel separator / vowel
ɪ or a when following a vowel, eg. کوئلہ koɪlɑː coal; لائن lɑːɪn queue; ہیئت hɛat astronomy. The hamza indicates that this vowel is pronounced separately from the preceding one.
iːɛ when used as izafat (see below).
Otherwise functions as a soundless vowel junction indicator ('hamza on its chair').
Vowel junctions: The hamza is used to indicate the boundaries between vowel sounds when there is no intervening consonant. Depending on the vowels concerned, it is used in a number of different ways. It can also have two different shapes, one like the initial form of 'ain and the other more like an italic 's'.
When the second vowel is an iː or e represented by ی or ے, the hamzā 'sits on a chair' before the letter representing the second vowel, eg. کئی kaiː several; تیئیس teiːs twenty-three; کوئی koiː someone; گئے gae they went; گائے gɑːe they sang.
Many words, however, have vowel combinations iːe, where hamzā is not typically used, eg. چلیے ʧaliːe come on.
See other ways in which vowel junctions are formed when dealing with other combinations of vowels.
Izāfat: ɪzɑːfat is the name given to the short vowel ɛ used to describe a relationship between two words. It may be translated of, eg. as in the Lion of Punjab.
This sound is mostly represented using zer, but in certain cases can be represented with a combining hamza. One such case occurs when the preceding word ends in ye ی: eg. ولئکامل valiː ɛ kɑːmɪl perfect saint.
There are other ways in which izafat can be formed.

U+06D3 ARABIC LETTER YEH BARREE WITH HAMZA ABOVE
Notes from the Unicode standard:
• Urdu
• actually a ligature, not an independent letter
≡ 06D2 0654
Urdu Izāfat ɪzɑːfat marker
ɛ
Izāfat is the name given to the short vowel ɛ used to describe a relationship between two words. It may be translated of, eg. as in the Lion of Punjab.
This sound is mostly represented using zer, but can also be represented with a combining hamza in a couple of cases.
Izāfat may also be shown as ے with or without a combining hamzā when the preceding word ends in a long vowel: eg. صدا ۓ بلن sadɑː ɛ buland a high voice; روۓزمین ruː ɛ zamiːn the surface of the ground.
There are other ways in which izafat can be formed.
See also 06D2 ARABIC LETTER YEH BARREE.
Refs: Matthews; Delacy

U+06C2 ARABIC LETTER HEH GOAL WITH HAMZA ABOVE
Notes from the Unicode standard:
• Urdu
• actually a ligature, not an independent letter
≡ 06C1 0654
Urdu consonant with izafat, ɪzɑːfat
hɛ when used as izafat.
NOTE: The Unicode Standard indicates that this grapheme should be represented using U+06C0 ARABIC LETTER HEH WITH YEH ABOVE, however that doesn't work with the Nafees Nastaleeq font, and I have seen evidence elsewhere that in common use this HEH GOAL WITH HAMZA character is used for this purpose. Need to investigate further.
Izāfat ɪzɑːfat is the name given to the short vowel ɛ used to describe a relationship between two words. It may be translated of, eg. as in the Lion of Punjab.
This sound is mostly represented using zer, but in certain cases can be represented with a combining hamza. One such case occurs when the preceding word ends in choṭī he ہ: eg. قطرۂآب qatrah ɛ ɑːb drop of water.
There are other ways in which izafat can be formed.

U+0622 ARABIC LETTER ALEF WITH MADDA ABOVE
Notes from the Unicode standard:
≡ 0627 0653
Urdu consonant, alif madd əlɪf mədd
ɑː (used word initially), eg. آب ɑːb now. Unlike the short vowel diacritics, the diacritic madd is never omitted.
As an exception, it used in non-initial position in the word for Koran, القرآن.
madd means increasing.
See also 0627 ARABIC LETTER ALEF
Refs: Matthews; Delacy

U+06BA ARABIC LETTER NOON GHUNNA
Notes from the Unicode standard:
• Urdu
Urdu nasalisation indicator, nun ghunna nuːn ɣunna.
Looks like: ںںں ں
Indicates that the preceding vowel is nasalised.
At the end of a word, an undotted form is used, eg. ماں mãː, mother, کروں karũː, I may do.
Nasalization within a word uses a form with a dot that looks just like the letter ن 0646 ARABIC LETTER NOON, eg. ٹاںگ tãːg leg.
This is not counted as a regular letter of the Urdu alphabet.

U+06BE ARABIC LETTER HEH DOACHASHMEE
Notes from the Unicode standard:
• Urdu
• forms aspirate digraphs
Urdu aspiration marker / calendar indicator, do cašmī he.
Aspiration: Used to create the aspirated letters of the Urdu alphabet. Each letter is composed of two characters. The letters are: بھ bʰe, پھ pʰe, تھ tʰe, ٹھ ʈʰe, جھ ʤʰe, چھ ʧʰe, دھ dʰe, ڈھ ɖʰe, ڑھ ɽʰe, کھ kʰe, and گھ gʰe.
Until recently choṭī he 06C1 ARABIC LETTER HEH GOAL ہ and do cašmī he could be used interchangeably to express aspiration, eg. ہاں or ھاں for hãː yes. Modern practice is to use this character exclusively for aspiration, though people do still occasionally confuse the two.
Calendar indicator: Dates using the Muslim calendar are followed by the word ہجری hɪʤriː, which is abbreviated with the symbol ھ.

U+064B ARABIC FATHATAN
Urdu vowel
an
This is a doubled zabar. These marks appear at the ends of certain Arabic adverbs. The doubled zabar is the most common of the three marks of this type. Although the mark appears over an alif the vowel sound is short. Examples, یقیناً yakiːnan (certainly); مثلاً masalan (for example).

U+064F ARABIC DAMMA
Urdu vowel, peš peʃ.
Rarely used; only where pronunciation needs to be spelled out. Indicates a vowel following its base character. peš means forward.
ʊ above a consonant, eg. بُب bʊb. At the begining of a word it appears above alif or 'ain, eg. اُب ʊb.
When the base consonant is followed by certain other letters, peš represents different sounds, as shown below:
uː when followed by vɑːuː, eg. پُورا puːrɑː (full), and اُوپر uːpar (above).
o when followed by 'ain, eg. شُعلہ ʃolɑː (flame), and توُّع tavaqːo (hope).
ɔ when followed by ʧʰoʈiː he or baṛī he, eg. شُہرت ʃɔhrat (fame), and توجُّہ tavajːɔh (attention).
ʊ, rather than a long vowel, in two very common words with a following vɑːuː: خُود xʊd (self), and خُوش xʊʃ (happy).
The word وہ vo (that, he, she, it) is irregular.
See a table of combining marks for vowels.

U+0650 ARABIC KASRA
Urdu vowel, zer zer
Rarely used; only where pronunciation needs to be spelled out. Indicates a vowel following its base character. zer means below.
ɪ below a base consonant, eg. بِب bɪb. At the begining of a word it appears below alif or 'ain, eg. اِتْنَا ɪtnɑː (so much) and عِلْم ɪlm (knowledge).
When the base consonant is followed by certain other letters, zer represents different sounds, as shown below:
iː when followed by je, eg. سِینہ siːnɑː (breast), and اِیمان iːmɑːn (faith).
e when followed by ain, eg. شِعر ʃer (verse), and واقِع vɑːqe (situated).
ɛ when followed by ʧʰoʈiː he or baɽiː he, eg. مِہربانی mɛhrbɑːniː (kindness), and واضِح vɑːzɛh (clear).
See a table of combining marks for vowels.
ɪzāfat: ɪzɑːfat is the name given to the short vowel ɛ when used to describe a relationship between two words. It may be translated of, eg. as in the Lion of Punjab.
This sound is mostly represented using zer. Sometimes, however, the combining mark is not shown, even though pronounced. Examples: شیرِ پنجاب ʃer ɛ panʤɑːb (Lion of the Panjab); طالبِ علم tɑːlɪb ɛ ɪlm (seeker of knowledge (a student)).
There are other ways in which ɪzāfatcan be formed.

U+0651 ARABIC SHADDA
Urdu mark, tašdīd taʃdiːd.
Doubles the sound of the base consonant, eg. ستّر sattar seventy. More often than not, this is not written.
tašdīd means strengthening.

U+064E ARABIC FATHA
Urdu vowel, zabar zəbər
Rarely used; only where pronunciation needs to be spelled out. Indicates a vowel following its base character. zabar means above.
ə above a consonant, eg. بَب bəb. At the begining of a word it appears above alif or 'ain, eg. اَب əb (now), and عَرَب ərəb (Arab).
When the base consonant is followed by certain other letters, zabar represents different sounds, as shown below:
ɑː when followed by alif, silent choṭī he, or 'ain, eg. بَاغ bɑːɣ (garden), مکَہ makːɑː (Mecca), and بَعد bɑːd (after).
ɛ when followed by je (both forms), eg. جَیسا ʤɛsɑː (as), اَیسا ɛsɑː (such), and ہَے hɛ (is).
ɛ when followed by choṭī he or baṛī he, eg. اَحمد ɛhmad (Ahmed), and رَہنا rɛhnɑː (to remain).
ɔ when followed by vɑːuː, eg. شَوق ʃɔq (keenness), and اَور ɔr (and).
See a table of combining marks for vowels.

U+0652 ARABIC SUKUN
Notes from the Unicode standard:
• marks absence of a vowel after the base consonant
• used in some Korans to mark a long vowel as ignored
• can have a variety of shapes, including a circular one and a shape that looks like '06E1'
→ (arabic small high dotless head of khah - 06E1)
Urdu mark, sukūn sukuːn or jazm ʤazm.
Rarely used; indicates absence of a vowel between consonants, eg. سَخْت saxt (hard).
It has various possible forms, including a small round circle, something that looks like peʃ, and something like a circumflex. (There is another Unicode character that provides an alternative visual form, 06E1: ARABIC SMALL HIGH DOTLESS HEAD OF KHAH, but it is better to use this character and select the variant required using a font.)
This diacritic is never written above the final character in a word, because as a rule a short vowel is not pronounced in this position.
Sukūn is an Arabic word meaning rest or pause.

U+0656 ARABIC SUBSCRIPT ALEF
Urdu mark
Used to indicate that the vowel is iː or i rather than e, eg. نُحْیٖ.
This diacritic is not usually needed, and serves only to emphasise that the vowel is long.

U+0657 ARABIC INVERTED DAMMA
Notes from the Unicode standard:
= ulta pesh
• Kashmiri, Urdu
Urdu mark
Used to indicate that the vowel is uː or ʊ rather than ɔ, eg. حبل حلالہٗ.
This diacritic is not usually needed, and serves only to emphasise that the vowel is long.

U+0658 ARABIC MARK NOON GHUNNA
Notes from the Unicode standard:
• Baluchi
• indicates nasalization in Urdu
Urdu mark
Nasalisation of Urdu vowels is normally indicated by 06BA ARABIC LETTER NOON GHUNNA ں. At the end of a word this has no dot above, but in the middle of a word it looks exactly like 0646 ARABIC LETTER NOON ن (and some people may mix up the use of these characters).
This diacritic is used when people want to make it clear that this glyph represents nasalisation rather than the letter nūn.
It is not used in a standard way, just when the user prefers, and is fairly uncommon, eg. ساں٘گ The CRULP fonts don't appear to show the diacritic as expected.

U+0670 ARABIC LETTER SUPERSCRIPT ALEF
Notes from the Unicode standard:
• actually a vowel sign, despite the name
Urdu vowel
ɑː
Used in a few Arabic words over the final form of 06CC ARABIC LETTER FARSI YEH ی to produce the sound ɑː: eg. اعلیٰ alɑː (paramount, highest); دعویٰ davɑː (law suit, claim).

U+060C ARABIC COMMA
Notes from the Unicode standard:
• also used with Thaana and Syriac in modern text
→ (comma - 002C)
→ (turned comma - 2E32)
Urdu punctuation

U+060D ARABIC DATE SEPARATOR
Urdu date separator
Used in dates between the (numeric) date and the month name, eg. ۴؍صفر ۱۳۰۲ھ.

U+061F ARABIC QUESTION MARK
Notes from the Unicode standard:
• also used with Thaana and Syriac in modern text
→ (question mark - 003F)
→ (reversed question mark - 2E2E)
Urdu punctuation

U+061B ARABIC SEMICOLON
Notes from the Unicode standard:
• also used with Thaana and Syriac in modern text
→ (semicolon - 003B)
→ (turned semicolon - 2E35)
Urdu punctuation

U+066B ARABIC DECIMAL SEPARATOR
Urdu punctuation, ašāriya əʃɑːrɪjɑ.
In Urdu this looks like a hamza ٫, eg. ۲۵۲۴٫۲۳ do hazɑːr pɑ̃ːʧ sau caubiːs aʃɑːrɪjɑː do tiːn (2524.23).

U+0601 ARABIC SIGN SANAH
Urdu symbol, sanh sənh.
Gregorian dates are indicated by placing this long sweep below the year digits with the word عیسوی iːsviː Christian era. This is usually abbreviated as a hamza ء.
Dates using the Muslim calendar are followed by the word ہجری hɪʤriː, which is abbreviated with the symbol ھ.
The sanh sign is typed before the digits (in a rtl context): eg. ۲۰۰۴ء (2004). It is not a combining character, even though it displays beneath the digits.
The sanh is derived from the Arabic word for year سنة.

U+0603 ARABIC SIGN SAFHA
Urdu symbol, safah səfəh
Used to indicate a page number, where English would use an abbreviation such as "pp. 35-45", eg. ۴۵. The stroke may be elongated and pass under the number.
The symbol is derived from the stroke used for 0635: ARABIC LETTER SAD.

U+0600 ARABIC NUMBER SIGN
Urdu symbol
Used to indicate the beginning of a number, eg. ۱۲۳.
The stroke may be elongated and pass under the number, but this is not a combining character.

U+0602 ARABIC FOOTNOTE MARKER
Urdu symbol
Used to indicate that a number is a footnote, eg. ۵؎.
The number usually sits above the symbol, although this is not a combining character however I can't figure out whether it needs to be typed in before or after the number - though I think before. None of the fonts I have put the number above it.
Do not confuse this with 060E ARABIC POETIC VERSE SIGN.

U+060F ARABIC SIGN MISRA
Urdu symbol misra misrə
Urdu poetry typically creates poems from couplets. This symbol is used to indicate a single line (misra) of a couplet (shayr) from an Urdu poem, when quoted in text.
This sign is used when quoting a line of verse in text. It is used at the beginning of the line, and is followed by the line of verse. See an example.

U+0610 ARABIC SIGN SALLALLAHOU ALAYHE WASSALLAM
Notes from the Unicode standard:
• represents sallallahu alayhe wasallam 'may God's peace and blessings be upon him'
Urdu Represents sallallahu alayhe wasallam [sallallao alae va sallam] (may God's peace and blessings be upon him) صلّى الله عليه وسلّم. Used over the name of Mohammed.
One of several marks that represent phrases expressing the status of a person, most having specifically religious meaning.
The mark is really associated with a word, rather than a character, but the placement is left to the user. The mark is often added somewhere in the middle of a name, but commonly appears towards the end. This depends to some extent on the letter shapes present and the calligraphic style in use, eg. محمّدؐ [muhamːed sallallao alae va sallam].

U+0611 ARABIC SIGN ALAYHE ASSALLAM
Notes from the Unicode standard:
• represents alayhe assalam 'upon him be peace'
Urdu Represents alayhe asallam [alejsallam] (upon him be peace) عليه السّلام. Used over the name of prophets other than Mohammed.
One of several marks that represent phrases expressing the status of a person, most having specifically religious meaning.
The mark is really associated with a word, rather than a character, but the placement is left to the user. The mark is often added somewhere in the middle of a name, but commonly appears towards the end. This depends to some extent on the letter shapes present and the calligraphic style in use, eg. عیسؑیٰ [isaː salejsallam] Christ, upon him be peace!.

U+0612 ARABIC SIGN RAHMATULLAH ALAYHE
Notes from the Unicode standard:
• represents rahmatullah alayhe 'may God have mercy upon him'
Urdu Represents rahmatulla alayhe [raːmatʊlla alee] (may God have mercy upon him) رحمت الله عليه. Used over the names of saints, religious authorities, and other deceased pious persons.
One of several marks that represent phrases expressing the status of a person, most having specifically religious meaning.
The mark is really associated with a word, rather than a character, but the placement is left to the user. The mark is often added somewhere in the middle of a name, but commonly appears towards the end. This depends to some extent on the letter shapes present and the calligraphic style in use, eg. قاضی نور محمّدؒ [kaziː nur mamed rahmatulla alayhe] Qazi Nur Muhammad, may God have mercy upon him!.

U+0613 ARABIC SIGN RADI ALLAHOU ANHU
Notes from the Unicode standard:
• represents radi allahu 'anhu 'may God be pleased with him'
Urdu Represents radi allahu 'anhu [raziallaːo ano ] (may God be pleased with him) رضي الله عنه. Used over the names of the Companions of the Prophet.
One of several marks that represent phrases expressing the status of a person, most having specifically religious meaning.
The mark is really associated with a word, rather than a character, but the placement is left to the user. The mark is often added somewhere in the middle of a name, but commonly appears towards the end. This depends to some extent on the letter shapes present and the calligraphic style in use, eg. ابوبکرؓ [abu bakr, raziallaːo ano] Abu Bakr, may God be pleased with him!.

U+0614 ARABIC SIGN TAKHALLUS
Notes from the Unicode standard:
• sign placed over the name or nom-de-plume of a poet, or in some writings used to mark all proper names
Urdu Sign placed over the name or nom-de-plume of a poet, or in some writings used to mark all proper names.
The mark is really associated with a word, rather than a character, but the placement is left to the user. The mark is often added somewhere in the middle of a name, but commonly appears towards the end. This depends to some extent on the letter shapes present and the calligraphic style in use, eg. عطاشادؔ [ataː ʃaːd] Ata Shad (author's name) There seems to be a problem displaying this with Nafees fonts.

U+060E ARABIC POETIC VERSE SIGN
Urdu Often used to mark the beginning of poetic verse. For an example see Figure 8 in Jonathan Kew's examples.
Do not confuse this with 0602 ARABIC FOOTNOTE MARKER.

U+FDFD ARABIC LIGATURE BISMILLAH AR-RAHMAN AR-RAHEEM
Urdu symbol, bismillāhi rraḥmāni rraḥīm
بسم الله الرحمن الرحيم
This ligature is part of UZT 1.01, the national standard code page of Government of Pakistan for Urdu. In Pakistan, it is required by law to begin all official govenment documents with the Bismillah ligature.
This character is included in the Unicode Standard because the writing is in Arabic, and it would be difficult for some Urdu speakers to type it in as characters. It is in the Arabic compatability character range, however, not the standard Arabic block.

U+06F4 EXTENDED ARABIC-INDIC DIGIT FOUR
Notes from the Unicode standard:
• Persian has a different glyph than Sindhi and Urdu
Urdu digit, cār ʧɑːr
Shape is different from Persian and Arabic. Looks like: ۴

U+06F5 EXTENDED ARABIC-INDIC DIGIT FIVE
Notes from the Unicode standard:
• Persian, Sindhi, and Urdu share glyph different from Arabic
Urdu digit, pāṅc pɑ̃ːʧ
Shape is different from Arabac. Looks like: ۵

U+06F6 EXTENDED ARABIC-INDIC DIGIT SIX
Notes from the Unicode standard:
• Persian, Sindhi, and Urdu have glyphs different from Arabic
Urdu digit, che ʧʰe
Shape is different from Arabic. Looks like: ۶

U+06F7 EXTENDED ARABIC-INDIC DIGIT SEVEN
Notes from the Unicode standard:
• Urdu and Sindhi have glyphs different from Arabic
Urdu digit, sāt sɑːt
Shape is different from Arabic. Looks like: ۷