Accesskey n skips to in-page navigation. Skip to the content start

 
ishida >> writing

Urdu script notes [Draft]

These notes are still in development. I am using them to explore the Arabic script as used for Urdu.

This page sets out to list the symbols used to represent Urdu text, describe their use, and relate them to appropriate characters for representation in Unicode. Along the way I also describe the basic phonology associated with the graphical symbols.

In some cases there is some discussion about which Unicode characters are most appropriate, and it was to address these questions that I originally embarked on this.

You need a font that supports nastaliq style for all Urdu characters. See the sidebar for links to the fonts I used. Alternatively you can view the PDF version of this document.

Brief script introduction

Urdu uses the Arabic script with extensions. A number of the extensions are based on those developed for Persian (Farsi).

The script type is abjad, ie. the script is largely consonantal and short vowel sounds are typically not shown. Some of the consonant characters double as long vowels (eg. ی and و). The vowels are not usually clearly defined, but when necessary, vowel information can be represented by combining marks appearing above or below the base consonant. The absence of a vowel and doubling of consonants can be indicated in the same way.

The basic alphabet covers a much wider repertoire of sounds than found in Arabic, so several extensions have been added to the basic Arabic script. Many of these come via Persian. The alphabet includes aspirated letters that have to be composed with two Unicode characters and a je letter that uses different Unicode characters depending on the context. The alphabet is as follows:

اببھپپھتتھٹٹھثججھچچھحخددھڈڈھرڑڑھزژسشصضطظعغفقککھگگھلمنوہی‎/ے

Some other characters are used that are not generally counted as part of the alphabet. These are:

alɪf with madː nuːn ɣunːa dochashmi he ʧʰoʈiː he with hamza baɽiː je with hamza fathatan dammatan kasratan peʃ zer taʃdiːd zabar sukuːn or ʤazm superscript alef subscript alef inverted peʃ nuːn ɣunːa

In addition, there are:
punctuation characters: ،؟۔
digits: ۰۱۲۳۴۵۶۷۸۹
and various signs & symbols: aʃɑːrɪjɑ date separator number sign sanh footnote marker safah misra sallallahu alayhe wasallam alayhe asallam rahmatulla alayhe radi allahu 'anhu takhallus poetic verse sign

Although it is not always possible to guess the vowel sounds in a word, the consonants are largely reliable phonetically. There is mostly a one-to-one correspondance between letters and sounds.

Shaping and style

Since the script is cursive (ie. letters are typically joined) the letter forms can vary considerably according to position.

Urdu is typically written in a nasta'liq style; ie. the connected letters in a word tend to follow a sloping baseline. This is achieved in Unicode by using the correct font - the underlying characters used are not different for nasta'liq vs. other styles.

Conjuncts

The absence of a vowel sound can be indicated with a diacritic called sukuːn or ʤazm, although this diacritic is not normally shown in text, eg. سَخْت [saxt] hard.

It has various possible forms, including a small round circle, something that looks like peʃ, and something like a circumflex.

This diacritic is never written above the final character in a word, because as a rule a short vowel is not pronounced in this position.

Consonant lengthening

Consonant sounds can be lengthened. In vowelled text, which is very rare, this is shown using a diacritic called taʃdiːd, eg. ستّر [sattar], seventy. More often than not, this is not written.

Vowels

There are 10 vowel sounds, though there are also allophonic variants. They are usually grouped into pairs of 'short' and 'long' sounds - although the difference is qualitative, rather than just length. The basic phonemes are as follows:

ə* ɪ ʊ ɛ ɔ
ɑː e o
* The phoneme [ə] is written [a] in all phonemic transcriptions in this material. (This is the letter usually used in other sources too.)

The following table shows the standard ways of indicating vowel sounds when diacritics are used. Note however, that context can change the value of a vowel diacritic (such as a following ain or he) - these are detailed below the table. Three short vowels are not typically found in final position. The examples only show diacritics for the sound currently being discussed.

sound final medial initial base component final medial initial
ə   064E: ARABIC FATHA 0627: ARABIC LETTER ALEF+064E: ARABIC FATHA zabar   بَب [bəb] اَب [əb]
ɪ   0650: ARABIC KASRA 0627: ARABIC LETTER ALEF+0650: ARABIC KASRA zer   دِن [dɪn] اِن [ɪn]
ʊ   064F: ARABIC DAMMA 0627: ARABIC LETTER ALEF+064F: ARABIC DAMMA peʃ   سُست [sʊst] اُس [ʊs]
ɑː 0627: ARABIC LETTER ALEF 0627: ARABIC LETTER ALEF 0622: ARABIC LETTER ALEF WITH MADDA ABOVE alɪf لکھنا [lɪkʰnɑː]
باغ [bɑːɣ]
آج [ɑːʤ]
e 06D2: ARABIC LETTER YEH BARREE 06CC: ARABIC LETTER FARSI YEH 0627: ARABIC LETTER ALEF+06CC: ARABIC LETTER FARSI YEH je بجے [baʤe] بیٹا [beʈɑː] ایک [ek]
06CC: ARABIC LETTER FARSI YEH 06CC: ARABIC LETTER FARSI YEH+0650: ARABIC KASRA 06CC: ARABIC LETTER FARSI YEH+0650: ARABIC KASRA zer+je / je گاری [gɑːriː] تِین [tiːn] اِینٹ [iːnʈ]
ɛ 064E: ARABIC FATHA+06D2: ARABIC LETTER YEH BARREE 064E: ARABIC FATHA+06CC: ARABIC LETTER FARSI YEH 0627: ARABIC LETTER ALEF+064E: ARABIC FATHA+06CC: ARABIC LETTER FARSI YEH zabar+je ہَے [hɛ] کَیسا [kɛsɑː] اَیسا [ɛsɑː]
o 0648: ARABIC LETTER WAW 0648: ARABIC LETTER WAW 0627: ARABIC LETTER ALEF+0648: ARABIC LETTER WAW vɑːuː کو [ko] ٹوپی [ʈopiː] اوس [os]
0648: ARABIC LETTER WAW+064F: ARABIC DAMMA
0648: ARABIC LETTER WAW+0657: ARABIC INVERTED DAMMA
0648: ARABIC LETTER WAW+064F: ARABIC DAMMA
0648: ARABIC LETTER WAW+0657: ARABIC INVERTED DAMMA
0627: ARABIC LETTER ALEF+064F: ARABIC DAMMA+0648: ARABIC LETTER WAW
0627: ARABIC LETTER ALEF+0648: ARABIC LETTER WAW+0657: ARABIC INVERTED DAMMA
peʃ+vɑːuː or
vɑːuː+inverted peʃ
ہندُو [hɪnduː]
ہندوٗ [hɪnduː]
پُورا [puːrɑː]
ثوٗرا [puːrɑː]
اُوپر [uːpar]
اوٗپر [uːpar]
ɔ 064E: ARABIC FATHA+0648: ARABIC LETTER WAW 064E: ARABIC FATHA+0648: ARABIC LETTER WAW 0627: ARABIC LETTER ALEF+064E: ARABIC FATHA+0648: ARABIC LETTER WAW zabar+vɑːuː نَو [nɔ] شَوق [ʃɔq] اَور [ɔr]

ain The letter ع is used in words of Arabic origin. In these words it can fulfill the same function as the alɪf that provides a vowel with support at the beginning of an Urdu word, eg. عَرب [arab] Arab. The Urdu word اَرَب [arab] necessity, though pronounced the same, becomes a completely different word by its spelling. Note, in particular, that the equivalent of آ (alɪf+madː) [ɑː] is عا, as in عادت [ɑːdat] habit.

A following ع may also affect a short vowel diacritic to produce a long vowel sound as follows:

  1. [ɑː] from zabar followed by ain, eg. بَعد [bɑːd] after

  2. [e] from zer followed by ain, eg. شِعر [siːr] verse

  3. [o] from peʃ followed by ain, eg. شُعلہ [ʃolɑː] flame

ʧʰoʈiː he and baɽiː he The letters ہ and ح can also modify preceding short vowels as follows:

  1. [ɛ] from zabar followed by he, eg. اَحمد [ɛhmad] Ahmed, رَہنا [rɛhnɑː] to remain

  2. [ɛ] from zer followed by he, eg. مِہربانی [mɛhrbɑːniː] kindness, and واضِح [vɑːzɛh] clear

  3. [o] from peʃ followed by ain, eg. شُہرت [ʃɔhrat] fame, and توجُّہ [tavajːɔh] attention

The so-called 'silent' he that appears at the end of many words of Arabic or Persian derivation is pronounced [ɑː], مکَہ [makːɑː] Mecca.

Nasalisation

Vowels may be nasalised, like at the end of the French word élan. This is indicated in Urdu by a glyph called nuːn ɣunːa that looks like the letter nuːn except that in word final position it has no dot, eg. ماں [mãː], mother, ٹاںگ [tãːg] leg, and کروں [karũː], I may do.

Vowel junctions

A hamzā plays more than one role in Urdu. One such role is to indicate the boundaries between vowel sounds when there is no intervening consonant. Depending on the vowels concerned, it is used in a number of different ways. It can also have two different shapes, one like the initial form of 'ain and the other more like an italic 's'.

In this example we see hamza in its isolated form, انشاءﷲ [ɪnʃalːaː] God willing.

When the second vowel is an or e represented by ی or ے, the hamzā 'sits on a chair' before the letter representing the second vowel, eg. کئی kaiː several; تیئیس teiːs twenty-three; کوئی koiː someone; گئے gae they went; گائے gɑːe they sang.

The short vowel ɪ as a second vowel is also represented by hamzā 'on its chair', eg. کوئلہ koɪlɑː coal; لائن lɑːɪn queue.

To represent hamzā 'on a chair' for initial or medial positions with the Nafees Nastaleeq script you can use 0626: ARABIC LETTER YEH WITH HAMZA ABOVE. This is not an ideal solution, however, since the hamza is sitting on a je that is not actually appropriate. This becomes particularly problematic when decomposed for normalization. It seems that in order to resolve this issue a new Unicode character will be needed.

When the second vowel is an or o represented by و, the hamzā typically sits directly on top of the و, eg. آؤ ɑːo come; جاؤں ʤɑːũː I may go. Note that often the hamzā is omitted in this situation. To represent this in Unicode use 0624: ARABIC LETTER WAW WITH HAMZA ABOVE.

Many words have the vowel combinations iːɑ̃ iːe iːo, where hamzā is not typically used, eg. لڑکیاں laɽkiːɑ̃ː girls; چلیے ʧaliːe come on; لڑکیوں کا laɽkiːõ kɑː of the girls.

Hamzā is also used to represent izāfat when the preceding word ends in either choṭī he or ye (see below).

Izāfat

Izāfat [ɪzɑːfat] is the name given to the short vowel ɛ used to describe a relationship between two words. It may be translated of, eg. as in the Lion of Punjab.

This sound is mostly represented using zer. Sometimes, however, the combining mark is not shown, even though pronounced. Examples: شیرِ پنجاب ʃer ɛ panʤɑːb Lion of the Panjab; طالبِ علم tɑːlɪb ɛ ɪlm seeker of knowledge (a student).

Izāfat is represented by a combining hamzā when the preceding word ends in either choṭī he ہ or ye ی: eg. قطرۂآب qatrah ɛ ɑːb drop of water; ولئکام valiː ɛ kɑːmɪl perfect saint.

I have a question about the use of ARABIC YEH WITH HAMZA ABOVE, which the Nafees Nastaleeq font requires. Note also that the Nafees Nastaleeq font will not work properly if there is a space after the izāfat.

izāfat may also be shown as ے with or without a combining hamzā when the preceding word ends in a long vowel: eg. صدا ۓ بلند sadɑː ɛ buland a high voice; رو ۓ زمین ruː ɛ zamiːn the surface of the ground.

I have a question about how the Nafees Nastaleeq font handles precomposed vs decomposed versions of yeh baree with hamza.

Arabic definite article

The pronunciation of ال (alif followed by lām) varies when it represents the Arabic definite article . This affects many words in Urdu that have come from Arabic, in particular names and adverbial expressions.

The lām is not pronounced if it precedes one of the following characters: ت062A te, ث062B se, د062F dāl, ذ0630 zāl, ر0631 re, ز0632 ze, س0633 sīn, ش0634 šīn, ص0635 svād, ض0636 zvād, ط0637 toe, ظ0638 zoe, ل0644 lām, ن0646 nūn. Instead, the following sound is doubled. A tašdīd may sometimes be used to indicate this. Example: السلام علیکم [asːalɑːm alaikum] greetings.

Often the alif is not pronounced after a short preceding word that ends in a vowel. If the preceding vowel was long, it is shortened in this process. Examples: بالکل [bɪlkul] absolutely; فی الحال [filhɑːl] at present.

Often the vowel is pronounced [ʊ], eg. دارالحکومت [dɑːrʊlhʊkuːmat] capital.

Alphabet

627   0627: ARABIC LETTER ALEF

Urdu Called [alɪf].

This letter of the alphabet represents one of several possible vowels.

  • In word initial position, alone: a ɪ u

  • Word initial, followed by ye, ای: iː e ɛ

  • Word initial, followed by vāū, او: uː o ɔ

  • With madd combining mark, آ: ɑː (see 0622 ARABIC LETTER ALEF WITH MADDA ABOVE)

  • Elsewhere: ɑː, unless part of the Arabic definite article (see below).

The alternative sounds possible in the initial combinations can be disambiguated, when necessary, by the use of combining marks. The combining marks are rarely used in normal text (with the exception of madd shown above). See a table of combining marks for vowels.

Arabic definite article The pronunciation of ال (alif followed by lām) varies when it represents the Arabic definite article . This affects many words in Urdu that have come from Arabic, in particular names and adverbial expressions.

Often the alif is not pronounced after a short preceding word that ends in a vowel. If the preceding vowel was long, it is shortened in this process. Examples: بالکل [bɪlkul] absolutely; فی الحال [filhɑːl] at present.

Often the vowel is pronounced [ʊ], eg. دارالحکومت [dɑːrʊlhʊkuːmat] capital.

The lam may also not be pronounced. See 0644 ARABIC LETTER LAM.

[edit]

628   0628: ARABIC LETTER BEH

Urdu [b] Called [be].

Nastaliq joining forms: ببب

This character is also followed by 06BE: ARABIC LETTER HEH DOACHASHMEE to give بھ. This is a distinct letter of the Urdu alphabet, called [bʰe]. The two letters together represent the aspirated b in Urdu.

[edit]

67e   067E: ARABIC LETTER PEH

Urdu [p] Called [pe]

Nastaliq forms: پپپ

This character is also followed by 06BE: ARABIC LETTER HEH DOACHASHMEE to give پھ. This is a distinct letter of the Urdu alphabet, called [pʰe]. The two letters together represent the aspirated consonant [] in Urdu.

[edit]

62a   062A: ARABIC LETTER TEH

Urdu [t] Called [te].

Nastaliq joining forms: تتت

This character is also followed by 06BE: ARABIC LETTER HEH DOACHASHMEE to give تھ. This is a distinct letter of the Urdu alphabet, called [tʰe]. The two letters together represent the aspirated consonant [] in Urdu.

[edit]

679   0679: ARABIC LETTER TTEH

Urdu [ʈ] Called [ʈe].

Nastaliq forms: ٹٹٹ ٹ

This character is also followed by 06BE: ARABIC LETTER HEH DOACHASHMEE to give ٹھ. This is a distinct letter of the Urdu alphabet, called [ʈʰe]. The two letters together represent the aspirated retroflex consonant [ʈʰ] in Urdu.

[edit]

62b   062B: ARABIC LETTER THEH

Urdu [s] Called [se]

Nastaliq joining forms: ثثث

In Urdu, this letter only occurs in words of Arabic and Persian origin, and is much less common than س 0633 ARABIC LETTER SEEN, which is also pronounced [s].

[edit]

62c   062C: ARABIC LETTER JEEM

Urdu [ʤ] Called [ʤiːm]

Nastaliq forms: ججج

This character is also followed by 06BE: ARABIC LETTER HEH DOACHASHMEE to give جھ. This is a distinct letter of the Urdu alphabet, called [ʤʰe]. The two letters together represent the aspirated retroflex consonant [ʤʰ] in Urdu.

[edit]

686   0686: ARABIC LETTER TCHEH

Urdu [ʧ] Called [ʧe].

Nastaliq forms: چچچ چ

This character is also followed by 06BE: ARABIC LETTER HEH DOACHASHMEE to give چھ. This is a distinct letter of the Urdu alphabet, called [ʧʰe]. The two letters together represent the aspirated consonant [ʧʰ] in Urdu.

[edit]

62d   062D: ARABIC LETTER HAH

Urdu [h] Called [baɽiː he].

Nastaliq forms: ححح ح

[edit]

62e   062E: ARABIC LETTER KHAH

Urdu [x] Called [xe].

Nastaliq forms: خخخ خ

[edit]

62f   062F: ARABIC LETTER DAL

Urdu [d] Called [dɑːl].

Nastaliq forms: ـد د

This character is also followed by 06BE: ARABIC LETTER HEH DOACHASHMEE to give دھ. This is a distinct letter of the Urdu alphabet, called [dʰe]. The two letters together represent the aspirated consonant [] in Urdu.

[edit]

688   0688: ARABIC LETTER DDAL

Urdu [ɖ] Called [ɖɑːl].

Nastaliq forms: ـڈ ڈ

This character is also followed by 06BE: ARABIC LETTER HEH DOACHASHMEE to give ڈھ. This is a distinct letter of the Urdu alphabet, called [ɖʰe]. The two letters together represent the aspirated retroflex consonant [ɖʰ] in Urdu.

[edit]

631   0631: ARABIC LETTER REH

Urdu [r] (pronounced with a trill) Called re [re]

Nastaliq forms: ـر ر

[edit]

691   0691: ARABIC LETTER RREH

Urdu [ɽ] Called [ɽe].

Nastaliq forms: ـڑ ڑ

This character is also followed by 06BE: ARABIC LETTER HEH DOACHASHMEE to give ڑھ. This is a distinct letter of the Urdu alphabet, called [ɽʰe]. The two letters together represent the aspirated retroflex consonant [ɽʰ] in Urdu.

[edit]

632   0632: ARABIC LETTER ZAIN

Urdu [z] Called [ze].

Nastaliq forms: ـز ز

[edit]

698   0698: ARABIC LETTER JEH

Urdu [ʒ] Called [ʒe].

Nastaliq forms: ـژ ژ

[edit]

633   0633: ARABIC LETTER SEEN

Urdu [s] Called sīn [siːn].

Nastaliq forms: سسس س

In Urdu nastiliq text this can have two somewhat different shapes. In addition to the shape shown here, the wavy part of the letter is sometimes a single swash - especially when two sīn characters are written together.

Use the same character for both visual forms. When one or other of the possible shapes is desired, this should be produced by the font.

[edit]

634   0634: ARABIC LETTER SHEEN

Urdu [ʃ] Called [ʃiːn].

Nastaliq forms: ششش ش

In Urdu nastiliq text this can have two somewhat different shapes. In addition to the shape shown here, the wavy part of the letter is sometimes a single swash - especially when two šīn characters are written together.

Use the same character for both visual forms. When one or other of the possible shapes is desired, this should be produced by the font.

[edit]

635   0635: ARABIC LETTER SAD

Urdu [s] Called svād [svɑːd].

Nastaliq forms: صصص ص

Only used in words of Arabic origin.

[edit]

636   0636: ARABIC LETTER DAD

Urdu [z] Called zvād [zvɑːd].

Nastaliq forms: ضضض ض

Only used in words of Arabic origin.

[edit]

637   0637: ARABIC LETTER TAH

Urdu [t] Called toe [toe].

Nastaliq forms: ططط ط

Only used in words of Arabic origin.

[edit]

638   0638: ARABIC LETTER ZAH

Urdu [z] Called zoe [zoe].

Nastaliq forms: ظظظ ظ

Only used in words of Arabic origin.

[edit]

639   0639: ARABIC LETTER AIN

Urdu No sound. Called ain.

Nastaliq forms: ععع ع

No sound, but this letter is preserved in Arabic words in which it occurs.

At beginning of a word it functions like alɪf, carrying a vowel, eg. عرب [arab], Arab. An initial long ɑː, usually represented by alɪf with madː آ, can sometimes be represented by ain followed by alɪf, eg. عادت [ɑːdat] habit.

Note how, although they are pronounced the same, عرب [arab]] Arab, and ارب [arab] necessity, indicate different words.

In non-initial positions ain can cause a change in sound to preceding short vowels, resulting in long vowels, but not always the long form typically associated with a given short form.

ain changes a short [a] to [ɑː], eg. بعد [bɑːd] after.

It changes a short [ɪ] to [e], eg. سعر [ser] verse.

It changes a short [ʊ] to [o], eg. شعلہ [ʃolɑː] flame.

[edit]

63a   063A: ARABIC LETTER GHAIN

Urdu [ɣ] Called [ɣain].

Nastaliq forms: غغغ غ

[edit]

641   0641: ARABIC LETTER FEH

Urdu [f] Called [fe].

Nastaliq forms: ففف ف

[edit]

642   0642: ARABIC LETTER QAF

Urdu [q] Called [qɑːf].

Nastaliq forms: ققق ق

[edit]

6a9   06A9: ARABIC LETTER KEHEH

Urdu [k] Called [kɑːf].

Nastaliq forms: ککک ک

When followed by alif or lām, this has a special rounded shape, eg. کا [kɑː], of; کل [kal], yesterday.

This character is also followed by 06BE: ARABIC LETTER HEH DOACHASHMEE to give کھ. This is a distinct letter of the Urdu alphabet, called [kʰe]. The two letters together represent the aspirated consonant [] in Urdu.

[edit]

6af   06AF: ARABIC LETTER GAF

Urdu [g] Called [gɑːf].

Nastaliq forms: گگگ گ

When followed by alif or lām, this has a special rounded shape, eg. گام [gɑːm], step; گل [gul], rose.

This character is also followed by 06BE: ARABIC LETTER HEH DOACHASHMEE to give گھ. This is a distinct letter of the Urdu alphabet, called [gʰe]. The two letters together represent the aspirated consonant [] in Urdu.

[edit]

644   0644: ARABIC LETTER LAM

Urdu [l] Called [lɑːm].

Nastaliq forms: للل ل

Combined with a following alif, lām is usually written as لا, eg. گلاس [gilɑːs], glass. Sometimes, however, especially in words of Arabic origin such as the equivalent of the English prefix 'un-', the more Arabic form لا is used, eg. لاعلاج [lɑːʕilɑːʒ], incurable.

Note that I can't find a way to make this example work with a single font. To produce it I had to mix two different fonts!

Arabic definite article The pronunciation of ال (alif followed by lām) varies when it represents the Arabic definite article . This affects many words in Urdu that have come from Arabic, in particular names and adverbial expressions.

The lām is not pronounced if it precedes one of the following characters: ت062A te, ث062B se, د062F dāl, ذ0630 zāl, ر0631 re, ز0632 ze, س0633 sīn, ش0634 šīn, ص0635 svād, ض0636 zvād, ط0637 toe, ظ0638 zoe, ل0644 lām, ن0646 nūn. Instead, the following sound is doubled. A tašdīd may sometimes be used to indicate this. Example: السلام علیکم [asːalɑːm alaikum] greetings.

There may also be effects to the sound of the alif too. See 0627 ARABIC LETTER ALEF.

[edit]

645   0645: ARABIC LETTER MEEM

Urdu [m] Called [miːm].

Nastaliq forms: ممم م

[edit]

646   0646: ARABIC LETTER NOON

Urdu [n] Called [nuːn].

Nastaliq forms: ننن ن

Used for the nasal consonant, but also used to represent word medial nasalisation of vowels, eg. ٹانگ ʈãːg, leg.

[edit]

648   0648: ARABIC LETTER WAW

Urdu [v uː o ɔ] Called vɑːuː.

Nastaliq forms: ـو و

Both a consonant and a vowel.

As a consonant, it is a cross between v and w, eg. والد [vaːlɪd] father, نومبر [navambar] November.

As a vowel, whether word initial after alɪf, او, or elsewhere on its own, it is one of [uː o ɔ].

The alternative vowel sounds can be disambiguated, when necessary, by the use of combining marks. The combining marks are rarely used in normal text. See a table of combining marks for vowels.

In a number of words of Persian origin beginning with خوا it is silent, eg. خواب [xɑːb] dream.

In two very common words it represents the short vowel [ʊ]: خود [xʊd] self, and خوش [xʊʃ] happy.

[edit]

6c1   06C1: ARABIC LETTER HEH GOAL

Urdu [h] Called [ʧʰoʈiː he].

Nastaliq forms: ہہہ ہ

The initial form is written with a hook beneath, eg. ہندو [hinduː] Hindu. The medial can be written with or without, eg. کہاں [xɑːb] dream.

A special initial form is used before alif or lam, eg. ہاں [hãː] yes, and اہل [ahl] people.

Silent he: In Urdu words this letter is pronounced at the end of a word. Many Arabic and Persian words end in a he that is pronounced [ɑː] (just like alif), eg. مکّہ [makːɑː] Mecca.

A word like [rɑːʤɑː], king, can be spelled with either an alif or a he, ie. راجا or راجہ. This is because the original Indian word was borrowed into Persian, then back into Urdu. Both spellings are now acceptable.

Doubled he: In order to distinguish some words where the final h is pronounced rather than representing [ɑː] or [ɛ], the choṭī he is sometimes doubled, eg. کہہ [kɛh] say vs. کہ [].

Aspiration: Until recently choṭī he ہ and do cašmī he ھ could be used interchangeably, eg. ہاں or ھاں for [hãː] yes. Modern practice is to use the latter exclusively for aspiration, though people do still occasionally confuse the two.

Vowels: ʧʰoʈiː he can convert the preceding vowel from what would otherwise have been [a], [ɪ] and [ʊ] to [ɛ] and [o] , eg. اَحمد [ɛhmad] Ahmed, رَہنا [rɛhnɑː] to remain .

[edit]

6cc   06CC: ARABIC LETTER FARSI YEH

Urdu [iː e ɛ j] Called je.

Nastaliq forms: یی ی

The Urdu letter je has two distinct visual forms requiring the use of two Unicode characters: this one ی and ے ‎06D2: ARABIC LETTER YEH BARREE.

The letter je represents both a consonant and a vowel.

At the beginning and middle of a word the form ی usually represents a consonant, eg. یار [jɑːr], friend and سایہ [sɑːjɑː], shadow.

As a vowel, the form ی is used in word initial position after alɪf ای, and in medial position on its own, to represent [iː e ɛ], eg. ایک [ek] one , سینہ [siːnɑː] breast, and کیسا [kɛsɑː] how.

To represent the vowels [e] or [ɛ] in final position the form ے is used, eg. لڑکے [laɽke] boys.

In word final position the vowel form ی represents only [iː], eg. لڑکی [laɽkiː] girl.

Also, in isolated form ی represents [iː], whereas ے stands for [e] or [ɛ].

The alternative vowel sounds can be disambiguated, when necessary, by the use of combining marks. The combining marks are rarely used in normal text. See a table of combining marks for vowels.

The baɽiː je form is also used to represent izāfat. See 06D2 ARABIC LETTER YEH BARREE.

[edit]

6d2   06D2: ARABIC LETTER YEH BARREE

Urdu [e ɛ] Called [baɽiː je].

Nastaliq forms: ـے ے

The Urdu letter [je] has two distinct visual forms requiring the use of two Unicode characters. This one ے and ی ‎‎06CC ARABIC LETTER FARSI YEH.

The letter [je] represents both a consonant and a vowel, but this form is used only for vowels.

At the beginning and middle of a word the form ی usually represents a consonant, eg. یار [jɑːr], friend and سایہ [sɑːjɑː], shadow.

As a vowel, the form ی is used in word initial position after alif ای, and in medial position on its own, to represent [iː e ɛ], eg. ایک [ek], one and سینہ [siːnɑː], breast.

To represent the vowels [e] or [ɛ] in final position the form ے is used, eg. لڑکے [laɽke], boys.

In word final position the vowel form ی represents only [], eg. لڑکی [laɽkiː], girl.

Also, in isolated form ی represents [], whereas ے stands for [e] or [ɛ].

The alternative sounds possible in the initial combinations can be disambiguated, when necessary, by the use of combining marks. The combining marks are rarely used in normal text. See a table of combining marks for vowels

Izāfat: [ɪzɑːfat] is the name given to the short vowel [ɛ] used to describe a relationship between two words. It may be translated of, eg. as in the Lion of Punjab.

This sound is mostly represented using zer, but can also be represented with a combining hamza in a couple of cases.

izāfat may also be shown as ے with or without a combining hamzā when the preceding word ends in a long vowel: eg. صدا ۓ بلند [sadɑː ɛ buland] a high voice; رو ۓ زمین [ruː ɛ zamiːn] the surface of the ground.

There are other ways in which izafat can be formed.

[edit]

Additional characters

622   0622: ARABIC LETTER ALEF WITH MADDA ABOVE

Urdu [ɑː] Called madː.

Used in combination with alɪf at the beginning of a word to give the sound [ɑː], eg. آب [ɑːb] now. Unlike the short vowel diacritics, the diacritic madː is never omitted.

alɪf with madː is exceptionally used in non-initial position in the word for Koran, القرآن.

madː means increasing

[edit]

6ba   06BA: ARABIC LETTER NOON GHUNNA

Urdu Nasalisation. Called nuːn ɣunːa.

Nastaliq forms ںںں ں

Indicates that the preceding vowel is nasalised.

At the end of a word, an undotted form is used, eg. ماں [mãː], mother, کروں [karũː], I may do.

Nasalization within a word uses a form with a dot that looks just like the letter 0646 ARABIC LETTER NOON ن, eg. ٹاںگ [tãːg] leg.

This is not counted as a regular letter of the Urdu alphabet.

[edit]

6be   06BE: ARABIC LETTER HEH DOACHASHMEE

Urdu Called [] (do cašmī he).

Aspiration: This character is used to create the aspirated letters of the Urdu alphabet. Each letter is composed of two characters. The letters are: بھ bʰe, پھ pʰe, تھ tʰe, ٹھ ʈʰe, جھ ʤʰe, چھ ʧʰe, دھ dʰe, ڈھ ɖʰe, ڑھ ɽʰe, کھ kʰe, and گھ gʰe.

Until recently choṭī he 06C1 ARABIC LETTER HEH GOAL ہ and do cašmī he could be used interchangeably to express aspiration, eg. ہاں or ھاں for [hãː] yes. Modern practice is to use this character exclusively for aspiration, though people do still occasionally confuse the two.

Calendar indicator: Dates using the Muslim calendar are followed by the word ہجری [hɪʤriː], which is abbreviated with the symbol ھ.

[edit]

6c2   06C2: ARABIC LETTER HEH GOAL WITH HAMZA ABOVE

Urdu [ɛ]

Izāfat [ɪzɑːfat] is the name given to the short vowel [ɛ] used to describe a relationship between two words. It may be translated of, eg. as in the Lion of Punjab.

This sound is mostly represented using zer, but in certain cases can be represented with a combining hamza. One such case occurs when the preceding word ends in choṭī he ہ: eg. قطرۂآب [qatrah ɛ ɑːb] drop of water.

There are other ways in which izafat can be formed.

[edit]

6d3   06D3: ARABIC LETTER YEH BARREE WITH HAMZA ABOVE

Urdu [ɛ]

[ɪzɑːfat] is the name given to the short vowel [ɛ] used to describe a relationship between two words. It may be translated of, eg. as in the Lion of Punjab.

This sound is mostly represented using zer, but can also be represented with a combining hamza in a couple of cases.

Izāfat may also be shown as ے with or without a combining hamzā when the preceding word ends in a long vowel: eg. صدا ۓ بلن [sadɑː ɛ buland] a high voice; روۓزمین [ruː ɛ zamiːn] the surface of the ground.

There are other ways in which izafat can be formed.

[edit]

Combining marks

64b   064B: ARABIC FATHATAN

Urdu [an]

This is a doubled zabar. These marks appear at the ends of certain Arabic adverbs. The doubled zabar is the most common of the three marks of this type. Although the mark appears over an alif the vowel sound is short. Examples, یقیناً [yakiːnan] certainly; مثلاً [masalan] for example.

[edit]

64c   064C: ARABIC DAMMATAN

Urdu [un]

Doubled peš.

[edit]

64d   064D: ARABIC KASRATAN

Urdu [in]

Doubled zer.

[edit]

64f   064F: ARABIC DAMMA

Urdu [ʊ uː o ɔ] Called peʃ.

Rarely used; only where pronunciation needs to be spelled out. peʃ means forward.

Above a consonant it typically indicates a following short [ʊ], eg. بُب [bʊb]. At the begining of a word it appears above alɪf or ain, eg. اُب [ʊb].

When the base consonant is followed by certain other letters, peʃ represents different sounds, as shown below:

  • In combination with a following vɑːuː this represents [uː], eg. پُورا [puːrɑː] full, and اُوپر [uːpar] above.

  • In combination with a following ain this represents [o], eg. شُعلہ [ʃolɑː] flame, and توُّع [tavaqːo] hope.

  • A following ʧʰoʈiː he or baṛī he turns this into [ɔ], eg. شُہرت [ʃɔhrat] fame, and توجُّہ [tavajːɔh] attention.

In two very common words, with a following vɑːuː it represents the short vowel [ʊ]: خُود [xʊd] self, and خُوش [xʊʃ] happy.

The word وہ [vo] that, he, she, it is irregular.

See a table of combining marks for vowels.

[edit]

650   0650: ARABIC KASRA

Urdu [ɪ iː ɛ e] Called zer.

Rarely used; only where pronunciation needs to be spelled out. zer means below.

Below a base consonant it typically indicates a following short [ɪ], eg. بِب [bɪb]]. At the begining of a word it appears below alɪf or ain, eg. اِتْنَا [ɪtnɑː] so much and عِلْم [ɪlm] knowledge.

When the base consonant is followed by certain other letters, zer represents different sounds, as shown below:

  • In combination with a following je this represents [iː], eg. سِینہ [siːnɑː] breast, and اِیمان [iːmɑːn] faith.

  • In combination with a following ain this represents [e], eg. شِعر [ʃer] verse, and واقِع [vɑːqe] situated.

  • A following ʧʰoʈiː he or baɽiː he turns this into [ɛ], eg. مِہربانی [mɛhrbɑːniː] kindness, and واضِح [vɑːzɛh] clear.

See a table of combining marks for vowels.

ɪzāfat: (pronounced [ɪzɑːfat]) this is the name given to the short vowel [ɛ] when used to describe a relationship between two words. It may be translated of, eg. as in the Lion of Punjab.

This sound is mostly represented using zer. Sometimes, however, the combining mark is not shown, even though pronounced. Examples: شیرِ پنجاب [ʃer ɛ panʤɑːb] Lion of the Panjab; طالبِ علم [tɑːlɪb ɛ ɪlm]] seeker of knowledge (a student).

There are other ways in which ɪzāfat:can be formed.

[edit]

651   0651: ARABIC SHADDA

Urdu Called taʃdiːd.

Doubles the sound of the base consonant, eg. ستّر [sattar] seventy. More often than not, this is not written.

taʃdiːd means strengthening.

[edit]

64e   064E: ARABIC FATHA

Urdu [a ɑː ɛ ɔ] Called zabar.

Rarely used; only where pronunciation needs to be spelled out. zabar means above.

Above a consonant it typically indicates a following short [a], eg. بَب [bab]. At the begining of a word it appears above alɪf or ain, eg. اَب [ab] now, and عَرَب [arab] Arab.

When the base consonant is followed by certain other letters, zabar represents different sounds, as shown below:

  • In combination with a following alɪf, silent ʧʰoʈiː he, or ain this represents [ɑː], eg. بَاغ [bɑːɣ] garden, مکَہ [makːɑː] Mecca, and بَعد [bɑːd] after.

  • In combination with a following je (both forms) this represents [ɛ]], eg. جَیسا [ʤɛsɑː] as, اَیسا [ɛsɑː] such, and ہَے [hɛ] is.

  • A following ʧʰoʈiː he or baɽiː he also turns this into [ɛ], eg. اَحمد [ɛhmad]] Ahmed, and رَہنا [rɛhnɑː] to remain.

  • In combination with a following vɑːuː this represents [ɔ], eg. شَوق [ʃɔq] keenness, and اَور [ɔr] and.

See a table of combining marks for vowels.

[edit]

652   0652: ARABIC SUKUN

Urdu Called sukuːn or ʤazm.

Rarely used; indicates absence of a vowel between consonants, eg. سَخْت [saxt] hard.

It has various possible forms, including a small round circle, something that looks like peʃ, and something like a circumflex. (There is another Unicode character that provides an alternative visual form, 06E1: ARABIC SMALL HIGH DOTLESS HEAD OF KHAH, but it is better to use this character and select the variant required using a font.)

This diacritic is never written above the final character in a word, because as a rule a short vowel is not pronounced in this position.

sukuːn is an Arabic word meaning rest or pause.

[edit]

656   0656: ARABIC SUBSCRIPT ALEF

Urdu

Used to indicate a long [] vowel, or [i] as contrasted with [e], eg. نُحْیٖ. This diacritic is not usually need, and serves only to emphasise that the vowel is long.

[edit]

657   0657: ARABIC INVERTED DAMMA

Urdu

Used to indicate a long [] vowel, or [ʊ] as contrasted with [ɔ], eg. حبل حلالہٗ. This diacritic is not usually need, and serves only to emphasise that the vowel is long.

[edit]

658   0658: ARABIC MARK NOON GHUNNA

Nasalisation of Urdu vowels is normally indicated by 06BA ARABIC LETTER NOON GHUNNA ں. At the end of a word this has no dot above, but in the middle of a word it looks exactly like 0646 ARABIC LETTER NOON ن (and some people may use this character for this purpose).

This diacritic is used when people want to make it clear that this glyph represents nasalisation rather than the letter nuːn. It is not used in a standard way, just when the user prefers, and is fairly uncommon, eg. ساں٘گ The CRULP fonts don't appear to show the diacritic as expected.

[edit]

670   0670: ARABIC LETTER SUPERSCRIPT ALEF

Urdu [ɑː]

Used in a few Arabic words over the final form of 06CC ARABIC LETTER FARSI YEH ی to produce the sound ɑː: eg. اعلیٰ [alɑː] paramount, highest; دعویٰ [davɑː] law suit, claim.

[edit]

Punctuation

60c   060C: ARABIC COMMA

Urdu Called [].

Looks like ،

[edit]

60d   060D: ARABIC DATE SEPARATOR

Urdu Separator used in dates between the (numeric) date and the month name, eg. ۴؍صفر ؁۱۳۰۲ھ.

[edit]

61f   061F: ARABIC QUESTION MARK

Urdu Called [].

Looks like ؟

[edit]

66b   066B: ARABIC DECIMAL SEPARATOR

Urdu Called [aʃɑːrɪjɑ].

Looks like ٫

This should look like a hamzā in Urdu, eg. ۲۵۲۴٫۲۳ [do hazɑːr pɑ̃ːʧ sau caubiːs aʃɑːrɪjɑː do tiːn] 2524.23.

[edit]

6d4   06D4: ARABIC FULL STOP

Urdu Called [].

Looks like ۔

[edit]

Signs & symbols

601   0601: ARABIC SIGN SANAH

Urdu Called [sanh].

Gregorian dates are indicated by placing this long sweep below the year digits with the word عیسوی [iːsviː] Christian era. This is usually abbreviated as a hamza ء.

Dates using the Muslim calendar are followed by the word ہجری [hɪʤriː], which is abbreviated with the symbol ھ.

The sanh sign is typed before the digits (in a rtl context): eg. ؁۲۰۰۴ء2004. It is not a combining character, even though it displays beneath the digits.

The sanh is derived from the Arabic word for year سنة.

[edit]

603   0603: ARABIC SIGN SAFHA

Urdu Called [safah]

Used to indicate a page number, where English would use an abbreviation such as "pp. 35-45", eg. ؃۴۵. The stroke may be elongated and pass under the number.

The symbol is derived from the stroke used for 0635: ARABIC LETTER SAD.

[edit]

600   0600: ARABIC NUMBER SIGN

Urdu Used to indicate the beginning of a number, eg. ؀۱۲۳. The stroke may be elongated and pass under the number, but this is not a combining character.

[edit]

602   0602: ARABIC FOOTNOTE MARKER

Urdu Used to indicate that a number is a footnote, eg. ؎۵. The number usually sits above the symbol, although this is not a combining character however I can't figure out whether it needs to be typed in before or after the number - though I think before. The nafees font doesn't seem to support this.

Do not confuse this with 060E ARABIC POETIC VERSE SIGN.

[edit]

60f   060F: ARABIC SIGN MISRA

Urdu Called misra

Urdu poetry typically creates poems from couplets. This symbol is used to indicate a single line (misra) of a couplet (shayr) from an Urdu poem, when quoted in text.

This sign is used when quoting a line of verse in text. It is used at the beginning of the line, and is followed by the line of verse. See an example.

[edit]

610   0610: ARABIC SIGN SALLALLAHOU ALAYHE WASSALLAM

Urdu Represents sallallahu alayhe wasallam [sallallao alae va sallam] (may God's peace and blessings be upon him) صلّى الله عليه وسلّم. Used over the name of Mohammed.

One of several marks that represent phrases expressing the status of a person, most having specifically religious meaning.

The mark is really associated with a word, rather than a character, but the placement is left to the user. The mark is often added somewhere in the middle of a name, but commonly appears towards the end. This depends to some extent on the letter shapes present and the calligraphic style in use, eg. محمّدؐ [muhamːed sallallao alae va sallam].

[edit]

611   0611: ARABIC SIGN ALAYHE ASSALLAM

Urdu Represents alayhe asallam [alejsallam] (upon him be peace) عليه السّلام. Used over the name of prophets other than Mohammed.

One of several marks that represent phrases expressing the status of a person, most having specifically religious meaning.

The mark is really associated with a word, rather than a character, but the placement is left to the user. The mark is often added somewhere in the middle of a name, but commonly appears towards the end. This depends to some extent on the letter shapes present and the calligraphic style in use, eg. عیسؑیٰ [isaː salejsallam] Christ, upon him be peace!.

[edit]

612   0612: ARABIC SIGN RAHMATULLAH ALAYHE

Urdu Represents rahmatulla alayhe [raːmatʊlla alee] (may God have mercy upon him) رحمت الله عليه. Used over the names of saints, religious authorities, and other deceased pious persons.

One of several marks that represent phrases expressing the status of a person, most having specifically religious meaning.

The mark is really associated with a word, rather than a character, but the placement is left to the user. The mark is often added somewhere in the middle of a name, but commonly appears towards the end. This depends to some extent on the letter shapes present and the calligraphic style in use, eg. قاضی نور محمّدؒ [kaziː nur mamed rahmatulla alayhe] Qazi Nur Muhammad, may God have mercy upon him!.

[edit]

613   0613: ARABIC SIGN RADI ALLAHOU ANHU

Urdu Represents radi allahu 'anhu [raziallaːo ano ] (may God be pleased with him) رضي الله عنه. Used over the names of the Companions of the Prophet.

One of several marks that represent phrases expressing the status of a person, most having specifically religious meaning.

The mark is really associated with a word, rather than a character, but the placement is left to the user. The mark is often added somewhere in the middle of a name, but commonly appears towards the end. This depends to some extent on the letter shapes present and the calligraphic style in use, eg. ابوبکرؓ [abu bakr, raziallaːo ano] Abu Bakr, may God be pleased with him!.

[edit]

614   0614: ARABIC SIGN TAKHALLUS

Urdu Sign placed over the name or nom-de-plume of a poet, or in some writings used to mark all proper names.

The mark is really associated with a word, rather than a character, but the placement is left to the user. The mark is often added somewhere in the middle of a name, but commonly appears towards the end. This depends to some extent on the letter shapes present and the calligraphic style in use, eg. عطاشادؔ [ataː ʃaːd] Ata Shad (author's name) There seems to be a problem displaying this with Nafees fonts.

[edit]

60e   060E: ARABIC POETIC VERSE SIGN

Urdu Often used to mark the beginning of poetic verse. For an example see Figure 8 in Jonathan Kew's examples.

Do not confuse this with 0602 ARABIC FOOTNOTE MARKER.

[edit]

Digits

6f0   06F0: EXTENDED ARABIC-INDIC DIGIT ZERO

Urdu Called [sɪfar].

Digit ۰

[edit]

6f1   06F1: EXTENDED ARABIC-INDIC DIGIT ONE

Urdu Called [ek].

Digit ۱

[edit]