<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>&#62;&#62; blog &#187; writings</title>
	<atom:link href="http://rishida.net/blog/?feed=rss2&#038;cat=3" rel="self" type="application/rss+xml" />
	<link>http://rishida.net/blog</link>
	<description>News of changes to my main site, and W3C related posts.</description>
	<lastBuildDate>Sun, 22 Aug 2010 09:53:41 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.1</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Webfonts example pages</title>
		<link>http://rishida.net/blog/?p=464</link>
		<comments>http://rishida.net/blog/?p=464#comments</comments>
		<pubDate>Sun, 22 Aug 2010 09:45:59 +0000</pubDate>
		<dc:creator>r12a</dc:creator>
				<category><![CDATA[general]]></category>
		<category><![CDATA[i18n]]></category>
		<category><![CDATA[script notes]]></category>
		<category><![CDATA[web]]></category>
		<category><![CDATA[writings]]></category>
		<category><![CDATA[myanmar]]></category>
		<category><![CDATA[urdu]]></category>
		<category><![CDATA[webfonts]]></category>
		<category><![CDATA[woff]]></category>

		<guid isPermaLink="false">http://rishida.net/blog/?p=464</guid>
		<description><![CDATA[Webfonts, and WOFF in particular, have been in the news again recently, so I thought I should mention that a few days ago I changed my pages describing Myanmar and Arabic-for-Urdu scripts so that you can download the necessary font support for the foreign text, either as a TTF linked font or as WOFF font.
You [...]]]></description>
			<content:encoded><![CDATA[<p>Webfonts, and WOFF in particular, have been <a href="http://www.w3.org/News/2010#entry-8877">in the news again recently</a>, so I thought I should mention that a few days ago I changed my pages describing Myanmar and Arabic-for-Urdu scripts so that you can download the necessary font support for the foreign text, either as a TTF linked font or as WOFF font.</p>
<p>You can find the Myanmar page at <a href="http://rishida.net/scripts/myanmar/">http://rishida.net/scripts/myanmar/</a>.  Look for the links n the side bar to the right, under the heading &#8220;Fonts&#8221;. </p>
<p>The Urdu page, using the beautiful Nastaliq script, is at <a href="http://rishida.net/scripts/urdu/">http://rishida.net/scripts/urdu/</a>.</p>
<p>(Note that the examples of short vowels don&#8217;t use the nastiliq style.  Scroll down the page a little further.)</p>
<p>I haven&#8217;t had time to check whether all the opentype features are correctly rendered, but I&#8217;ve been doing Mac testing of the <a href="http://www.w3.org/International/tests/tests-html-css/list-fonts">i18n webfonts tests</a>, and it looks promising. (More on that later.)  The Urdu font doesn&#8217;t rely on OS rendering, which should help.</p>
<p>Here are some examples of the text on the page:<br />
<img src="images/urdu-myanmar-woff.png" alt="Examples of Urdu script and Myanmar script." /></p>
]]></content:encoded>
			<wfw:commentRss>http://rishida.net/blog/?feed=rss2&amp;p=464</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>An Introduction to Indic Scripts updated</title>
		<link>http://rishida.net/blog/?p=456</link>
		<comments>http://rishida.net/blog/?p=456#comments</comments>
		<pubDate>Mon, 16 Aug 2010 06:26:50 +0000</pubDate>
		<dc:creator>r12a</dc:creator>
				<category><![CDATA[general]]></category>
		<category><![CDATA[i18n]]></category>
		<category><![CDATA[script notes]]></category>
		<category><![CDATA[web]]></category>
		<category><![CDATA[writings]]></category>
		<category><![CDATA[character]]></category>
		<category><![CDATA[indian]]></category>
		<category><![CDATA[indic]]></category>
		<category><![CDATA[scripts]]></category>

		<guid isPermaLink="false">http://rishida.net/blog/?p=456</guid>
		<description><![CDATA[http://rishida.net/scripts/indic-overview/
I finally got around to refreshing this article, by converting the Bengali, Malayalam and Oriya examples to Unicode text. Back when I first wrote the article, it was hard to find fonts for those scripts.
I also added a new feature: In the HTML version, click on any of the examples in indic text and a [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://rishida.net/scripts/indic-overview/">http://rishida.net/scripts/indic-overview/</a></p>
<p>I finally got around to refreshing this article, by converting the Bengali, Malayalam and Oriya examples to Unicode text. Back when I first wrote the article, it was hard to find fonts for those scripts.</p>
<p>I also added a new feature: In the HTML version, click on any of the examples in indic text and a pop-up appears at the bottom right of the page, showing which characters the example is composed of.  The pop-up lists the characters in order, with Unicode names, and shows the characters themselves as graphics.</p>
<p>I have not yet updated this article&#8217;s incarnation as <a href="http://www.unicode.org/notes/tn10/">Unicode Technical Note #10</a>.  The Indian Government also used this article, and made a number of small changes.  I have yet to incorporate those, too.</p>
]]></content:encoded>
			<wfw:commentRss>http://rishida.net/blog/?feed=rss2&amp;p=456</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>What does it take to display non-Latin HTML pages?</title>
		<link>http://rishida.net/blog/?p=445</link>
		<comments>http://rishida.net/blog/?p=445#comments</comments>
		<pubDate>Wed, 04 Aug 2010 18:14:49 +0000</pubDate>
		<dc:creator>r12a</dc:creator>
				<category><![CDATA[general]]></category>
		<category><![CDATA[i18n]]></category>
		<category><![CDATA[web]]></category>
		<category><![CDATA[writings]]></category>
		<category><![CDATA[character encoding]]></category>
		<category><![CDATA[fonts]]></category>
		<category><![CDATA[unicode]]></category>

		<guid isPermaLink="false">http://rishida.net/blog/?p=445</guid>
		<description><![CDATA[I recently came across an email thread where people were trying to understand why they couldn&#8217;t see Indian content on their mobile phones.  Here are some notes that may help to clarify the situation.  They are not fully developed! Just rough jottings, but they may be of use.
Let&#8217;s assume, for the sake of [...]]]></description>
			<content:encoded><![CDATA[<p>I recently came across an email thread where people were trying to understand why they couldn&#8217;t see Indian content on their mobile phones.  Here are some notes that may help to clarify the situation.  They are not fully developed! Just rough jottings, but they may be of use.</p>
<p>Let&#8217;s assume, for the sake of an example, that the goal is to display a page in Hindi, which is written using the devanagari script. These principles, however, apply to one degree or another to all languages that use characters outside the ASCII range.</p>
<p>Let&#8217;s start by reviewing some fundamental concepts: character encodings and fonts.  If you are familiar with these concepts, skip to the next heading.</p>
<h4>Character encodings and fonts</h4>
<p>Content is composed of a sequence of <em>characters</em>. Characters represent letters of the alphabet, punctuation, etc. But content is stored in a computer as a sequence of bytes, which are numeric values. Sometimes more than one byte is used to represent a single character. Like codes used in espionage, the way that the sequence of bytes is converted to characters depends on what <em>key</em> was used to encode the text. In this context, that key is called a <em>character encoding</em>.</p>
<p>There are many character encodings to choose from.</p>
<p>The person who created the content of the page you want to read should have used a character encoding that supports devanagari characters, but it should also be a character encoding that is widely recognised by browsers and available in editors. By far the best character encoding to use (for any language in the world) is called UTF-8.  </p>
<p>UTF-8 is strongly recommended by the HTML5 draft specification.</p>
<p>There should be a <em>character encoding declaration</em> associated with the HTML code of your page to say what encoding was used.  Otherwise the browser may not interpret the bytes correctly. It is also crucial that the text is actually <em>stored</em> in that encoding too.  That means that the person creating the content must choose that encoding when they save the page from their editor.  It&#8217;s not possible to change the encoding of text simply by changing the character encoding declaration in the HTML code, because the declaration is there just to indicate to the browser what key to use to get at the already encoded text.</p>
<p>It&#8217;s one thing for the browser to know how to interpret the bytes to represent your text, but the browser must also have a way to make those characters stored in memory appear on the screen.</p>
<p>A font is essential here. Fonts contain instructions for displaying a character or a sequence of characters so that you can read them.  The visual representation of a character is called a glyph. The font converts characters to glyphs. </p>
<p>The font has tables to map the bytes in memory to text. To do this, the font needs to recognise the character encoding your page uses, and have the necessary tables to convert the characters to glyphs.  It is important that the font used can work with the character encoding used in the page you want to view.  Most fonts these days support UTF-8 encoded text.</p>
<p>Very simple fonts contain one glyph for each letter of the alphabet.  This may work for English, but it wouldn&#8217;t work for a complex script such as devanagari.  In these scripts the positioning and interaction of characters has to be modified according to the context in which they are displayed.  This means that the font needs additional information about how to choose and postion glyphs depending on the context.  That information may be built into the font itself, or the font may rely on information on your system.</p>
<h4>Character encoding support</h4>
<p>The browser needs to be able to recognise the character encoding used in order to correctly interpret the mapping between bytes and characters.</p>
<p>If the character encoding of the page is incorrectly declared, or not declared at all, there will be problems viewing the content. Typically, a browser allows the user to manually apply a particular encoding by selecting the encoding from the menu bar.</p>
<p>All browsers should support the UTF-8 character encoding.  </p>
<p>Sometimes people use an encoding that is not designed for devanagari support with a font that produces the right glyphs nevertheless.  Such approaches are fraught with issues and present poor interoperability on several levels. For example, the content can only be interpreted correctly by applying the specifically designed font; no other font will do if that font is not available. Also, the meaning of the text cannot be derived by machine processing, for web searches, etc., and the data cannot be easily copied or merged with other text (eg. to quote a sentence in another article that doesn&#8217;t use the same encoding). This practise seriously damages the openness of the Web and should be avoided at all costs.</p>
<h4>System font support</h4>
<p>Usually, a web page will rely on the operating system to provide a devanagari font.  If there isn&#8217;t one, users won&#8217;t be able to see the Hindi text.  The browser doesn&#8217;t supply the font, it picks it up from whatever platform the browser is running on.</p>
<p>If browser is running on a desktop computer, there may be a font already installed. If not, it should be possible to download free or commercial fonts and install them. If the user is viewing the page on a mobile device, it may currently be difficult to download and install one.</p>
<p>If there are several devanagari fonts on a system, the browser will usually pick one by default.  However, if the web page uses CSS to apply styling to the page, the CSS code may specify one or more particular fonts to use for a given piece of content. If none of these are available on the system, most browsers will fall back to the default, however Internet Explorer will show square boxes instead.</p>
<h4>Webfonts</h4>
<p>Another way of getting a font onto the user&#8217;s system is to download it with the page, just like images are downloaded with the page.  This is done using CSS code. The CSS code to do this has been defined for some years, but unfortunately most browsers  implementation of this feature is still problematic.</p>
<p>Recently a number of major browsers have begun to support download of raw truetype or opentype fonts. Internet Explorer is not one of those. This involves simply loading the ordinary font onto a server and downloading to the browser when the page is displayed. Although the font may be cached as the user moves from page to page, there may still be some significant issues when dealing with complex scripts or Far Eastern languages (such as Chinese, Japanese and Korean) due to the size of the fonts used. The size of these fonts can often be counted in megabytes rather than kilobytes.</p>
<p>It is important to observe licencing restrictions when making fonts available for download in this way. The CSS mechanism doesn&#8217;t contain any restrictions related to font licences, but there are ways of preparing fonts for download that take into consideration some aspects of this issue &#8211; though not enough to provide a watertight restriction on font usage. </p>
<p>Microsoft makes available a program to create .eot fonts from ordinary true/opentype fonts. Eot font files can apply some usage restrictions and also subset the font to include only the characters used on the page.  The subsetting feature is useful when only a small amount of text appears in a given font, but for a whole page in, say, devanagari script it is of little use &#8211; particularly if the user is to input text in forms.  The biggest problem with .eot files, however, is that they are only supported by Internet Explorer, and there are no plans to support .eot format on other browsers.</p>
<p>The W3C is currently working on the WOFF format. Fonts converted to WOFF format can have some gentle protection with regard to use, and also apply significant compression to the font being downloaded.  WOFF is currently only supported by Firefox, but all other major browsers are expected to provide support for the new format.</p>
<p>For this to work well, all browsers must support the same type of font download.</p>
<h4>Beyond fonts</h4>
<p>Complex scripts, such as those used for Indic and South East Asian languages, need to choose glyph shapes and positions and substitute ligatures, etc. according to the context in which characters are used. These adjustments can be acoomplished using the features of OpenType fonts.  The browser must be able to implement those opentype features.</p>
<p>Often a font will also rely on operating system support for some subset of the complex script rendering.  For example, a devanagari font may rely on the Windows uniscribe dll for things like positioning of left-appended vowel signs, rather than encoding that behaviour into the font itself.  This reduces the size and complexity of the font, but exposes a problem when using that font on a variety of platforms.  Unless the operating system can provide the same rendering support, the text will look only partially correct. Mobile devices must either provide something similar to uniscribe, or fonts used on the mobile device must include all needed rendering features.</p>
<p>Browsers that do font linking must also support the necessary opentype features and obtain functionality from the OS rendering support where needed.</p>
<p>If tools are developed to subset webfonts, the subsetting must not remove the rendering logic needed for correct display of the text.</p>
]]></content:encoded>
			<wfw:commentRss>http://rishida.net/blog/?feed=rss2&amp;p=445</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Bengali script notes updated</title>
		<link>http://rishida.net/blog/?p=374</link>
		<comments>http://rishida.net/blog/?p=374#comments</comments>
		<pubDate>Sat, 06 Feb 2010 09:34:42 +0000</pubDate>
		<dc:creator>r12a</dc:creator>
				<category><![CDATA[general]]></category>
		<category><![CDATA[i18n]]></category>
		<category><![CDATA[script notes]]></category>
		<category><![CDATA[web]]></category>
		<category><![CDATA[writings]]></category>
		<category><![CDATA[bengali]]></category>

		<guid isPermaLink="false">http://rishida.net/blog/?p=374</guid>
		<description><![CDATA[

If you&#8217;re interested, I just did a major overhaul of my script notes on Bengali in Unicode. There&#8217;s a new section about which characters to use when there are multiple options (eg. RRA vs. DDA+nukta), and the page provides information about more characters from the Bengali block in Unicode (including those used in Bengali&#8217;s amazingly [...]]]></description>
			<content:encoded><![CDATA[<div>
<p style="float: right; width: 260px; margin-left: 1em; margin-bottom: 1em"><img src="http://rishida.net/blog/images/bengalinotes.png" alt="Characters in the Unicode Bengali block." /></p>
<p>If you&#8217;re interested, I just did a major overhaul of my <a href="http://rishida.net/scripts/bengali/">script notes on Bengali</a> in Unicode. There&#8217;s a new section about which characters to use when there are multiple options (eg. RRA vs. DDA+nukta), and the page provides information about more characters from the Bengali block in Unicode (including those used in Bengali&#8217;s amazingly complicated currency notation prior to 1957).</p>
<p>In addition, this has all  been squeezed into the latest look and feel for script notes pages.</p>
<p>The new page is at a new location.  There is a redirect on the old page.</p>
<p>Hope it&#8217;s useful.
</p></div>
<p style="float:left; font-size: 150%"><a href="http://rishida.net/scripts/bengali/">&gt;&gt; Read it</a></p>
<p><br clear="all" /></p>
]]></content:encoded>
			<wfw:commentRss>http://rishida.net/blog/?feed=rss2&amp;p=374</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Myanmar (Burmese) script notes ready</title>
		<link>http://rishida.net/blog/?p=169</link>
		<comments>http://rishida.net/blog/?p=169#comments</comments>
		<pubDate>Mon, 06 Oct 2008 10:48:15 +0000</pubDate>
		<dc:creator>r12a</dc:creator>
				<category><![CDATA[general]]></category>
		<category><![CDATA[i18n]]></category>
		<category><![CDATA[script notes]]></category>
		<category><![CDATA[web]]></category>
		<category><![CDATA[writings]]></category>

		<guid isPermaLink="false">http://rishida.net/blog/?p=169</guid>
		<description><![CDATA[&#62;&#62; Read it !

I finally got to the point, after many long early morning hours, where I felt I could remove the &#8216;Draft&#8217; from the heading of my Myanmar (Burmese) script notes.
This page is the result of my explorations into how the Myanmar script is used for the Burmese language in the context of the [...]]]></description>
			<content:encoded><![CDATA[<p style="font-size: 150%;"><a href="http://rishida.net/scripts/myanmar/">&gt;&gt; Read it !</a></p>
<p style="float: right; margin: 1px solid teal; margin-left: 1em; margin-bottom: 1em;"><a href="http://rishida.net/blog/images/myanmarpicker7.png"><img src="http://rishida.net/blog/images/myanmarexcerpt.png" alt="Picture of the page in action."/></a></p>
<p>I finally got to the point, after many long early morning hours, where I felt I could remove the &#8216;Draft&#8217; from the heading of my Myanmar (Burmese) script notes.</p>
<p>This page is the result of my explorations into how the Myanmar script is used for the Burmese language in the context of the Unicode Myanmar block. It takes into account the significant changes introduced in Unicode version 5.1 in April of this year.</p>
<p>Btw, if you have JavaScript running you can get a list of characters in the examples by mousing over them.  If you don&#8217;t have JS, you can link to the same information.</p>
<p>There&#8217;s also a <a href="http://rishida.net/scripts/myanmar/myanmar.pdf">PDF version</a>, if you don&#8217;t want to install the (free) fonts pointed to for the examples.</p>
<p>Here is a summary of the script:</p>
<div style="background-color: #F0FFF0; padding: 10px; padding-bottom: 5px; margin-bottom: 10px;">
<p>Myanmar is a tonal language and is syllable-based. The script is an abugida, ie. consonants carry an inherent vowel sound that is overridden using vowel signs.</p>
<p>Spaces are used to separate phrases, rather than words. Words can be separated with ZWSP to allow for easy wrapping of text.</p>
<p>Words are composed of syllables. These start with an consonant or initial vowel. An initial consonant may be followed by a medial consonant, which adds the sound j or w. After the vowel, a syllable may end with a nasalisation of the vowel or an unreleased glottal stop, though these final sounds can be represented by various different consonant symbols.</p>
<p>At the end of a syllable a final consonant usually has an &#8216;asat&#8217; sign above it, to show that there is no inherent vowel.</p>
<p>In multisyllabic words derived from an Indian language such as Pali, where two consonants occur internally with no intervening vowel, the consonants tend to be stacked vertically, and the asat sign is not used.</p>
<p>Text runs from left to right.</p>
<p>There are a set of Myanmar numerals, which are used just like Latin digits.
</p></div>
<p>So, what next.  I&#8217;m quite keen to get to Mongolian.  That looks really complicated. But I&#8217;ve been telling myself for a while that I ought to look at Malayalam or Tamil, so I think I&#8217;ll try Malayalam.</p>
]]></content:encoded>
			<wfw:commentRss>http://rishida.net/blog/?feed=rss2&amp;p=169</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Moving paper</title>
		<link>http://rishida.net/blog/?p=151</link>
		<comments>http://rishida.net/blog/?p=151#comments</comments>
		<pubDate>Mon, 07 Apr 2008 17:18:36 +0000</pubDate>
		<dc:creator>r12a</dc:creator>
				<category><![CDATA[general]]></category>
		<category><![CDATA[web]]></category>
		<category><![CDATA[writings]]></category>

		<guid isPermaLink="false">http://rishida.net/blog/?p=151</guid>
		<description><![CDATA[I&#8217;m sitting here watching a video of Timbl talking on a BBC news page and I suddenly realised how good this was.
The page design helps give the impression &#8211; there are no clunky boxes around the video itself &#8211; but there&#8217;s also no need to view in a different area, or switch to another tool, [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m sitting here watching a video of Timbl talking on a <a href="http://news.bbc.co.uk/1/hi/technology/7299875.stm">BBC news page</a> and I suddenly realised how good this was.</p>
<p>The page design helps give the impression &#8211; there are no clunky boxes around the video itself &#8211; but there&#8217;s also no need to view in a different area, or switch to another tool, or even wait for a download to get started &#8211; it&#8217;s just there as part of the page, but a part that moves and produces sound.  Kind of like the moving paper in Harry Potter&#8217;s world.</p>
<p>It&#8217;s great how technology marches on sometimes.</p>
<p>[Update: Since I wrote the above the video has acquired grey panels around the edges for controls, which I think is a shame.  It's still pretty good technology though. ]</p>
]]></content:encoded>
			<wfw:commentRss>http://rishida.net/blog/?feed=rss2&amp;p=151</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Be flexible when referencing Unicode!</title>
		<link>http://rishida.net/blog/?p=135</link>
		<comments>http://rishida.net/blog/?p=135#comments</comments>
		<pubDate>Mon, 04 Feb 2008 19:59:53 +0000</pubDate>
		<dc:creator>r12a</dc:creator>
				<category><![CDATA[general]]></category>
		<category><![CDATA[i18n]]></category>
		<category><![CDATA[web]]></category>
		<category><![CDATA[writings]]></category>

		<guid isPermaLink="false">http://rishida.net/blog/?p=135</guid>
		<description><![CDATA[This post is about the dangers of tying a specification, protocol or application to a specific version of Unicode. 
For example, I was in a discussion last week about XML, and the problems caused by the fact that XML 1.0 is currently tied to a specific version of Unicode, and a very old version at [...]]]></description>
			<content:encoded><![CDATA[<p>This post is about the dangers of tying a specification, protocol or application to a specific version of Unicode. </p>
<p>For example, I was in a discussion last week about XML, and the problems caused by the fact that XML 1.0 is currently tied to a specific version of Unicode, and a very old version at that (2.0). This affects what characters you can use for things such as element and attribute names, enumerated lists for attribute values, and ids. Note that I&#8217;m <em>not</em> talking about the content, just those names. </p>
<p>I <a href="http://www.w3.org/2005/03/02-ishida-tech-plen/">spoke about this</a> at a W3C Technical Plenary some time back in terms of how this bars   people from using certain aspects of XML applications in their own language if they use scripts that have been added to Unicode since version 2.0. This includes over 150 million people speaking languages written with Ethiopic, Canadian Syllabics, Khmer, Sinhala, Mongolian, Yi, Philippine, New  Tai Lue, Buginese, Cherokee, Syloti Nagri, N&#8217;Ko,  Tifinagh and other scripts.</p>
<p>This means, for example, that if your language is written with one of these scripts, and you write some XHTML that you want to be valid (so you can use it with AJAX or XSLT, etc.), you can&#8217;t use the same language for an <a href="#ኢሺዳ" name="ኢሺዳ" id="ኢሺዳ">id attribute value</a> as for the content of your page. (Try <a href="http://validator.w3.org/check?verbose=1&#038;uri=http%3A%2F%2Frishida.net%2Fblog%2F">validating this page</a> now. The previous link used some Ethiopic for the name and id attribute values.) </p>
<p><strong>But there&#8217;s another issue</strong> that hasn&#8217;t received so much press &#8211; and yet I think, in it&#8217;s own way, it can be just as problematic. Scripts that were supported by Unicode 2.0 have not stood still, and additional characters are being added to such scripts with every new Unicode release. In some cases these characters will see very general use. Take for example,  the Bengali character U+09CE BENGALI LETTER KHANDA TA.</p>
<p>With the release of Unicode 4.1 this character was added to the standard, with a clear admonition that it should in future be used in text, rather than the workaround people had been using previously.</p>
<p>This is not a rarely used character. It is a common part of the alphabet. Put Bengali <a href="#উতসহ" name="উতসহ" id="উতসহ">in a link</a> and you&#8217;re generally ok. Include <a href="#কুৎসিত" name="কুৎসিত" id="কুৎসিত">a khanda ta letter</a> in it, though, and you&#8217;re in trouble. It&#8217;s as if English speakers could use any word in an id, as long as it didn&#8217;t have a &#8216;q&#8217; in it. It&#8217;s a recipe for confusion and frustration.</p>
<p>Similar, but much more far reaching, changes will be introduced to the Myanmar script (used for Burmese) in the upcoming version 5.1. Unlike the khanda ta, these changes will affect almost every word. So if your application or protocol froze its Unicode support to a version between 3.0 and 5.0, like IDNA, you will suddenly be disenfranchising Burmese users who had been perfectly happy until now.</p>
<p>Here are a few more examples (provided by Ken Whistler) of characters added to Unicode after the initial script adoption that will raise eyebrows for people who speak the relevant language: </p>
<ul>
<li>01F6 LATIN SMALL LETTER N WITH GRAVE: shows up in NFC pinyin data for Chinese.</li>
<li>0219 LATIN SMALL LETTER S WITH COMMA BELOW:  Romanian data.</li>
<li>0450 CYRILLIC SMALL LETTER IE WITH GRAVE: Macedonian in  NFC.</li>
<li>0653..0655 Arabic combining maddah and hamza:  Implicated in  NFC normalization of common Arabic letters now.</li>
<li>0972 DEVANAGARI LETTER CANDRA A: Marathi.</li>
<li>097B DEVANAGARI LETTER GGA: Sindhi.</li>
<li>0B35 ORIYA LETTER VA:  Oriya.</li>
<li>0BB6 TAMIL LETTER SHA:  Needed to  spell sri.</li>
<li>0D7A..0D7F   Malayalam chillu letters:  Those will be  ubiquitous in Malayalam data, post Unicode 5.1.</li>
<li>and a bunch of Chinese additions. </li>
</ul>
<p>So the moral is this: decouple your application, protocol or specification from a specific version of the Unicode Standard. Allow new characters to be used by people as they come along, and users all around the world will thank you.</p>
]]></content:encoded>
			<wfw:commentRss>http://rishida.net/blog/?feed=rss2&amp;p=135</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Negotiating the Iron Curtain</title>
		<link>http://rishida.net/blog/?p=129</link>
		<comments>http://rishida.net/blog/?p=129#comments</comments>
		<pubDate>Thu, 20 Dec 2007 18:51:31 +0000</pubDate>
		<dc:creator>r12a</dc:creator>
				<category><![CDATA[general]]></category>
		<category><![CDATA[writings]]></category>

		<guid isPermaLink="false">http://rishida.net/blog/?p=129</guid>
		<description><![CDATA[I just read a post by Ivan Herman about how Hungary has joined the Schengen Agreement, and will soon be removing border controls on the EU side.  That put me in mind of the first time I tried to pass through the Iron Curtain.
I was travelling from Vienna to Budapest (probably about 25 years [...]]]></description>
			<content:encoded><![CDATA[<p>I just read a <a href="http://ivanherman.wordpress.com/2007/12/20/schengen-non%e2%80%93borders/">post by Ivan Herman</a> about how Hungary has joined the Schengen Agreement, and will soon be removing border controls on the EU side.  That put me in mind of the first time I tried to pass through the Iron Curtain.</p>
<p>I was travelling from Vienna to Budapest (probably about 25 years ago) and I had decided to go through <a href="http://maps.google.com/maps?q=sopron">Sopron</a>, rather than Hegyeshalom, so I could see something a little more off the beaten track. This was a month-long InterRail trip, so I was able to follow my whim and jump on whatever train I wanted. The train connections worked, and I found myself heading south from Vienna.</p>
<p>Eventually, the train passed into Hungary and stopped. I needed a visa, so I got off with a bunch of other (Hungarian looking) people, and traipsed over to a small outbuilding, where I found myself at the back of a queue of people jostling bags of various sizes and dressed and coiffed in what looked to me to be a very Eastern European fashion.  Looking out of the window, everything was grey. I could see rail tracks and points and small, grey buildings but also several very tall towers with machine gun nests perched on top (quite large looking machine guns). The queue moved slowly, and I was surprised at one point to see my train pulling away and disappearing.  It seemed a bit odd (and I was glad I&#8217;d brought all my stuff with me), but I figured this was probably normal, and I&#8217;d just have to catch another train.</p>
<p>I finally arrived at the desk and asked for a visa. The guy behind the desk started talking to me in a somewhat animated fashion, but I had no idea what he was saying. I hadn&#8217;t learned German yet, and Hungarian was completely incomprehensible to me.  I kept trying to explain, politely, in English, that I needed a visa.  Finally, he gave me an exasperated look and called someone out of a nearby room.  The guy who emerged was huge, bald and intimidatingly business-like. (Some time later I saw the film Midnight Express, and realised that the prison guard and he could have been the same person.) He shouted at me &#8220;Nicht visa!&#8221;.  And I tried to explain, in English, that, yes, I had no visa, but would like to obtain one, please. This didn&#8217;t appear to get across clearly, because he simply repeated &#8220;Nicht visa!!&#8221; several times, increasing in volume.</p>
<p>Finally, the tension broke and gave way to action.  He motioned for me to follow him out of the building, and we started walking away across a couple of sets of railway tracks.  I noticed, feeling slightly less at ease but still hopeful, that I was flanked by a soldier with a gun on either side. They weren&#8217;t exactly giving me encouraging looks, and as I glanced up at the machine gun towers and at the surrounding barbed wire, I began to wish I knew what was happening.</p>
<p>Soon we arrived at the end of a short train. The very last carriage of this train looked like something you&#8217;d expect to see in a Wild West film.  It had a kind of standing area at each end with a railing, a door into the carriage and steps leading down to the ground on either side.  I was ushered up one set of steps and into what turned out to be an empty carriage. The door was shut behind me, and within a minute or so, as I remember it, the train started moving off, in the same direction my earlier train had disappeared. So I wasn&#8217;t just being sent back across the border. </p>
<p>That last realisation started to trouble me a little, since I still had no visa and no idea what was happening.  It didn&#8217;t help that there was a small round window in the door at each end of the carriage, through which I could see  guards sitting on the steps at each corner, all holding machine guns at the ready.  As the towers slid away behind us, night started to fall.</p>
<p>Twenty five years has dulled the memory of some of what happened next, but eventually I got off at a small station, having reached the end of the line. The guards were gone, and the station turned out to be quite modern and clean looking. I still couldn&#8217;t understand anything anyone was saying, so I still had no idea where I was, but I was able figure out that I was somehow back in Austria.  It was much later that I was to realise that Sopron is on a peninsular that sticks into Austria, and I had come in one side and been sent out the other.</p>
<p>I slept that night on the floor of the main station building, and the next morning set off to find someone who spoke English and could tell me where I was &#8211; and just as importantly how to get into Hungary.  The town was quite small, maybe just a village.  In spite of that it took me a while, but I eventually came across a chap in a supermarket who was able to explain to me that visas are not issued on entry into Hungary by train via Sopron.  I was ahead of him there.  He also offered to drive me to the border, telling me that I would be able to get a visa at the road entry point.</p>
<p>It&#8217;s nice to think about that person whenever I relive this story.  He really went out of his way, leaving work to assist a complete stranger, with no fuss or thought for reward.  I wonder whether he remembers me.  I doubt it. Of course, these days he may even be reading this blog post&#8230;</p>
<p>So it was that, eventually, I got the stamp in my passport that I needed, and somehow found my way onto another train heading for Budapest.  Well, it wasn&#8217;t quite the end of the fun.  That continued when I tried to meet up with my father in the capital.  But that, as they say, is another story&#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://rishida.net/blog/?feed=rss2&amp;p=129</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>No break space issue in Firefox</title>
		<link>http://rishida.net/blog/?p=123</link>
		<comments>http://rishida.net/blog/?p=123#comments</comments>
		<pubDate>Thu, 29 Nov 2007 10:03:35 +0000</pubDate>
		<dc:creator>r12a</dc:creator>
				<category><![CDATA[general]]></category>
		<category><![CDATA[i18n]]></category>
		<category><![CDATA[web]]></category>
		<category><![CDATA[writings]]></category>

		<guid isPermaLink="false">http://rishida.net/blog/?p=123</guid>
		<description><![CDATA[Tim Greenwood just pointed out to me a &#8216;bug&#8217; in my converter program, which I think is actually, in my mind, a bug in Firefox (although I imagine it was implemented by someone as a feature).
If you type A0 (the hex code for a non-breaking space) in the Hexadecimal code points field, then press Convert, [...]]]></description>
			<content:encoded><![CDATA[<p>Tim Greenwood just pointed out to me a &#8216;bug&#8217; in my <a href="http://rishida.net/scripts/uniview/conversion.php?codepoints=00A0&#038;origin=codepoint">converter program</a>, which I think is actually, in my mind, a bug in Firefox (although I imagine it was implemented by someone as a feature).</p>
<p>If you type A0 (the hex code for a non-breaking space) in the Hexadecimal code points field, then press Convert, you will get a blank space in the Characters field that should be U+00A0 NO-BREAK SPACE.  Then press Convert or View Names above this Characters field and you&#8217;ll find that what was supposed to be a NBSP has changed into an ordinary space.  IE7, Opera and Safari all continue to show the character in the field as a NBSP.  </p>
<p>(However, all four browsers substitute an ordinary space when you copy and paste the text from the Characters field into something else.)</p>
<p>I tried this with <a href="http://rishida.net/scripts/uniview/?codepoints=2002%202003%202006%202008%202009%20200B">a range of other types of space</a> , but had no such behaviour (<a href="http://rishida.net/scripts/uniview/conversion.php?codepoints=2002%202003%202006%202008%202009%20200B&#038;origin=codepoint">try it</a>).  They all remained themselves.  </p>
<p>Anyone know what this is about?</p>
]]></content:encoded>
			<wfw:commentRss>http://rishida.net/blog/?feed=rss2&amp;p=123</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Web Standards support day</title>
		<link>http://rishida.net/blog/?p=122</link>
		<comments>http://rishida.net/blog/?p=122#comments</comments>
		<pubDate>Mon, 26 Nov 2007 17:54:01 +0000</pubDate>
		<dc:creator>r12a</dc:creator>
				<category><![CDATA[general]]></category>
		<category><![CDATA[i18n]]></category>
		<category><![CDATA[web]]></category>
		<category><![CDATA[writings]]></category>

		<guid isPermaLink="false">http://rishida.net/blog/?p=122</guid>
		<description><![CDATA[
 
 
 
  Blue Beanie Day
  
  Originally uploaded by r12a
 

Monday, November 26, 2007 is the day thousands of Standardistas (people who support web standards) will wear a Blue Beanie to show their support for accessible, semantic, and hopefully internationalized web content.
I haven&#8217;t got a blue hat, so I cheated [...]]]></description>
			<content:encoded><![CDATA[<div style="float: right; margin-left: 10px; margin-bottom: 10px;">
 <a href="http://www.flickr.com/photos/ishida/2066406474/" title="photo sharing"><img src="http://farm3.static.flickr.com/2182/2066406474_00e6cb1290_m.jpg" alt="" style="border: solid 2px #000000;" /></a><br />
 <br />
 <span style="font-size: 0.9em; margin-top: 0px;"><br />
  <a href="http://www.flickr.com/photos/ishida/2066406474/">Blue Beanie Day</a><br />
  <br />
  Originally uploaded by <a href="http://www.flickr.com/people/ishida/">r12a</a><br />
 </span>
</div>
<p>Monday, November 26, 2007 is the day thousands of Standardistas (people who support web standards) will wear a Blue Beanie to show their support for accessible, semantic, and hopefully internationalized web content.</p>
<p>I haven&#8217;t got a blue hat, so I cheated a little by borrowing bits of the cover of Jeffrey Zeldman&#8217;s great book, &#8220;Designing with Web Standards&#8221;. That&#8217;s me under the hat though.</p>
<p>(If you&#8217;re wondering, the text on the left says the same as the text top right, in Arabic, Urdu, Inuktitut, Simplified Chinese, Traditional Chinese, Khazakh, Greek, Dzonkha, Ethiopian, Hebrew, Hindi, Nepali, Japanese, Korean, Hungarian, Punjabi, Thai and Venda.)</p>
<p>See the <a href="http://www.flickr.com/groups/bluebeanieday2007/pool/">Flickr pool</a>.<br />
<br clear="all" /></p>
]]></content:encoded>
			<wfw:commentRss>http://rishida.net/blog/?feed=rss2&amp;p=122</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>
