<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>ishida &#62;&#62; blog &#187; web</title>
	<atom:link href="http://rishida.net/blog/?tag=w3c&#038;feed=rss2" rel="self" type="application/rss+xml" />
	<link>http://rishida.net/blog</link>
	<description>News of changes to my main site, and W3C related posts.</description>
	<lastBuildDate>Thu, 18 Apr 2013 16:54:26 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.5.1</generator>
		<item>
		<title>HTML5 adds new translate attribute</title>
		<link>http://rishida.net/blog/?p=831</link>
		<comments>http://rishida.net/blog/?p=831#comments</comments>
		<pubDate>Wed, 22 Feb 2012 11:07:07 +0000</pubDate>
		<dc:creator>r12a</dc:creator>
				<category><![CDATA[general]]></category>
		<category><![CDATA[i18n]]></category>
		<category><![CDATA[web]]></category>
		<category><![CDATA[writings]]></category>
		<category><![CDATA[html5]]></category>
		<category><![CDATA[translate]]></category>

		<guid isPermaLink="false">http://rishida.net/blog/?p=831</guid>
		<description><![CDATA[A translate attribute was recently added to HTML5. At the three MultilingualWeb workshops we have run over the past two years, the idea of this kind of &#8216;translate flag&#8217; has constantly excited strong interest from localizers, content creators, and from folks working with language technology. How it works Typically authors or automated script environments will [...]]]></description>
				<content:encoded><![CDATA[<p>A <code class="kw">translate</code> attribute was recently <a href="http://dev.w3.org/html5/spec/global-attributes.html#the-translate-attribute ">added to HTML5</a>. At the three <a href="http://multilingualweb.eu/">MultilingualWeb workshops</a> we have run over the past two years, the idea of this kind of &#8216;translate flag&#8217; has constantly excited strong interest from localizers, content creators, and from folks working with language technology.</p>
<h3>How it works</h3>
<p>Typically authors or automated script environments will put the attribute in the markup of a page. You may also find that, in industrial translation scenarios, localizers may add attributes during the translation preparation stage, as a way of avoiding the multiplicative effects of dealing with mistranslations in a large number of languages.</p>
<p>There is no effect on the rendered page (although you could, of course, style it if you found a good reason for doing so). The attribute will typically be used by workflow tools when the time comes to translate the text – be it by the careful craft of human translators, or by quick gist-translation APIs and services in the cloud.</p>
<p>The attribute can appear on any element, and it takes just two values: <code class="kw">yes</code> or <code class="kw">no</code>. If the value is <code class="kw">no</code>, translation tools should protect the text of the element from translation. The translation tool in question could be an automated translation engine, like those used in the online services offered by Google and Microsoft. Or it could be a human translator&#8217;s &#8216;workbench&#8217; tool, which would prevent the translator inadvertently changing the text.</p>
<p>Setting this translate flag on an element applies the value to all contained elements and to all attribute values of those elements.</p>
<p>You don&#8217;t have to use <code>translate="yes"</code> for this to work. If a page has no <code class="kw">translate</code> attribute, a translation system or translator should assume that all the text is to be translated.  The <code class="kw">yes</code> value is likely to see little use, though it could be very useful if you need to override a translate flag on a parent element and indicate some bits of text that should be translated. You may want to translate the natural language text in examples of source code, for example, but leave the code untranslated.</p>
<h3>Why it is needed</h3>
<p>You come across a need for this quite frequently. There is an example in the HTML5 spec about the Bee Game.  Here is a similar, but real example from my days at Xerox, where the documentation being translated referred to a machine with text on the hardware that wasn&#8217;t translated.</p>
<blockquote translate="no"><p><code>&lt;p&gt;Click the Resume button on the Status Display or the<br />
&lt;span class=&quot;panelmsg&quot; translate=&quot;no&quot;&gt;CONTINUE&lt;/span&gt; button<br />
on the printer panel.&lt;/p&gt;</code></p></blockquote>
<p>Here are a couple more (real) examples of content that could benefit from the <code class="kw">translate</code> attribute.  The first is from a book, quoting a title of a work.</p>
<blockquote translate="no"><p><code>&lt;p&gt;The question in the title &lt;cite translate=&quot;no&quot;&gt;How Far Can You Go?&lt;/cite&gt; applies to both the undermining of traditional religious belief by radical theology and the undermining of literary convention by the device of &quot;breaking frame&quot;...&lt;/p&gt;</code></p></blockquote>
<p>The next example is from a page about French bread – the French for bread is &#8216;<span lang="fr" xml:lang="fr" translate="no">pain</span>&#8216;.</p>
<blockquote translate="no"><p><code>&lt;p&gt;Welcome to &lt;strong translate=&quot;no&quot;&gt;french pain&lt;/strong&gt; on Facebook. Join now to write reviews and connect with &lt;strong translate=&quot;no&quot;&gt;french pain&lt;/strong&gt;. Help your friends discover great places to visit by recommending &lt;strong translate=&quot;no&quot;&gt;french pain&lt;/strong&gt;.&lt;/p&gt;</code></p></blockquote>
<p>So adding the translate attribute to your page can help readers better understand your content when they run it through automatic translation systems, and can save a significant amount of cost and hassle for translation vendors with large throughput in many languages.</p>
<h3>What about Google Translate and Microsoft Translator?</h3>
<p>Both Google and Microsoft online translation services already provided the ability to prevent translation of content by adding markup to your content, although they did it in (multiple) different ways. Hopefully, the new attribute will help significantly by providing a standard approach.</p>
<p>Both Google and Microsoft currently support <code>class="notranslate"</code>, but replacing a class attribute value with an attribute that is a formal part of the language makes this feature much more reliable, especially in wider contexts. For example, a translation prep tool would be able to rely on the meaning of the HTML5 <code class="kw">translate</code> attribute always being what is expected. Also it becomes easier to port the concept to other scenarios, such as other translation APIs or localization standards such as XLIFF.</p>
<p>As it happens, the online service of Microsoft (who actually proposed a translate flag for HTML5 some time ago) already supported <code>translate="no"</code>. This, of course, was a proprietary tag until now, and Google didn&#8217;t support it. However, just yesterday morning I received word, by coincidence, that Webkit/Chromium has just added support for the <code class="kw">translate</code>  attribute, and yesterday afternoon Google added support for <code>translate="no"</code> to its online translation service. <a href="http://www.w3.org/International/tests/html-css/translate/results-online">See the results</a> of some tests I put together this morning. (Neither yet supports the <code>translate="yes"</code> override.)</p>
<p>In these proprietary systems, however, there are a good number of other non-standard ways to express similar ideas, even just sticking with Google and Microsoft.</p>
<p>Microsoft apparently supports <code>style="notranslate"</code>.  This is not one of the options Google lists for their online service, but on the other hand they have things that are not available via Microsoft&#8217;s service.</p>
<p>For example, if you have an entire page that should not be translated, you can add <code>&lt;meta name=&quot;google&quot; value=&quot;notranslate&quot;&gt;</code> inside the <code class="kw">head</code> element of your page and Google won&#8217;t translate any of the content on that page. (However they also support <code>&lt;meta name=&quot;google&quot; content=&quot;notranslate&quot;&gt;</code>.)  This shouldn&#8217;t be Google specific, and a single way of doing this, ie. <code>translate="no"</code>  on the <code class="kw">html</code> tag, is far cleaner.</p>
<p>It&#8217;s also not made clear, by the way, when dealing with either translation service, how to make sub-elements translatable inside an element where <code class="kw">translate</code> has been set to <code class="kw">no</code> &#8211; which may sometimes be needed.</p>
<p>As already mentioned, the new HTML5 translate attribute provides a simple and standard feature of HTML that can replace and simplify all these different approaches, and will help authors develop content that will work with other systems too.</p>
<h3>Can&#8217;t we just use the lang attribute?</h3>
<p>It was inevitable that someone would suggest this during the discussions around how to implement a translate flag, however overloading language tags is not the solution. For example, a language tag can indicate which text is to be spellchecked against a particular dictionary. This has nothing to do with whether that text is to be translated or not.  They are different concepts.  In a document that has <code>lang="en"</code> in the html header, if you set <code>lang="notranslate"</code> lower down the page, that text will now not be spellchecked, since the language is no longer English. (Nor for the matter will styling work, voice browsers pronounce correctly, etc.)</p>
<h3>Going beyond the translate attribute</h3>
<p>The W3C&#8217;s <a href="http://www.w3.org/TR/its/">ITS (International Tag Set) Recommendation</a> proposes the use of a translate flag such as the attribute just added to HTML5, but also goes beyond that in describing a way to assign translate flag values to particular elements or combinations of markup throughout a document or set of documents. For example, you could say, if it makes sense for your content, that by default, all <code class="kw">p</code> elements with a particular class name should have the translate flag set to <code class="kw">no</code> for a specific set of documents.</p>
<p>Microsoft offers something along these lines already, although it is much less powerful than the ITS approach. If you use <code>&lt;meta name=&quot;microsoft&quot; content=&quot;notranslateclasses myclass1 myclass2&quot; /&gt;</code> anywhere on the page (or as part of a widget snippet) it ensures that any of the CSS classes listed following “notranslateclasses” should behave the same as the “notranslate” class.</p>
<p>Microsoft and Google&#8217;s translation engines also don&#8217;t translate content within <code class="kw">code</code> elements.  Note, however, that you don&#8217;t seem to have any choice about this – there don&#8217;t seem to be instructions about how to override this if you do want your <code class="kw">code</code> element content translated.</p>
<p>By the way, there are plans afoot to set up a new MultilingualWeb-LT Working Group at the W3C in conjunction with a European Commission project to further develop ideas around the ITS spec, and create reference implementations. They will be looking, amongst many other things, at ways  of integrating the new <code class="kw">translate</code> attribute into localization industry workflows and standards. Keep an eye out for it.</p>
]]></content:encoded>
			<wfw:commentRss>http://rishida.net/blog/?feed=rss2&#038;p=831</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>html5&#8242;s new &lt;bdi&gt; element</title>
		<link>http://rishida.net/blog/?p=564</link>
		<comments>http://rishida.net/blog/?p=564#comments</comments>
		<pubDate>Wed, 20 Jul 2011 19:01:28 +0000</pubDate>
		<dc:creator>r12a</dc:creator>
				<category><![CDATA[general]]></category>
		<category><![CDATA[i18n]]></category>
		<category><![CDATA[web]]></category>
		<category><![CDATA[writings]]></category>
		<category><![CDATA[bdi]]></category>
		<category><![CDATA[bidi]]></category>

		<guid isPermaLink="false">http://rishida.net/blog/?p=564</guid>
		<description><![CDATA[The html5 specification contains a bunch of new features to support bidirectional text in web pages. Language written with right-to-left scripts, such as Arabic, Hebrew, Persian, Thaana, Urdu, etc., commonly mixes in words or phrases in English or some other language that uses a left-to-right script. The result is called bidirectional or bidi text. HTML [...]]]></description>
				<content:encoded><![CDATA[<p>The html5 specification contains a bunch of new features to support bidirectional text in web pages.  Language written with right-to-left scripts, such as Arabic, Hebrew, Persian, Thaana, Urdu, etc., commonly mixes in words or phrases in English or some other language that uses a left-to-right script.  The result is called bidirectional or bidi text.</p>
<p>HTML 4.01 coupled with the Unicode Bidirectional algorithm already does a pretty good job of managing bidirectional text, but there are still some problems when dealing with embedded text from user input or from stored data.</p>
<h3>The problem</h3>
<p>Here&#8217;s an example where the names of restaurants are added to a page from a database. This is the code, with the Hebrew shown using ASCII:</p>
<pre>
&lt;p>Aroma - 3 reviews&lt;/p>
&lt;p>PURPLE PIZZA - 5 reviews&lt;/p>
</pre>
<p>And here&#8217;s what you&#8217;d expect to see, and what you&#8217;d actually see.</p>
<div class="figure" style="width: 200px; float:left;text-align: center; display:block; margin-right: 10px;margin-left: 10px;">
<div class="figcaption">What it should look like.</div>
<p><img src="http://rishida.net/blog/images/restaurant-right.png" title="AZZIP ELPRUP - 5 reviews" alt="AZZIP ELPRUP - 5 reviews"/></p>
</div>
<div class="figure" style="width: 200px; float:left;text-align: center; display:block; margin-left:10px;margin-right: 10px;">
<div class="figcaption">What it actually looks like.</div>
<p><img src="http://rishida.net/blog/images/restaurant-wrong.png" title="5 - AZZIP ELPRUP reviews" alt="5 - AZZIP ELPRUP reviews"/></p>
</div>
<p><br style="clear:both;"/></p>
<p>The problem arises because the browser thinks that the &#8221; &#8211; 5&#8243; is part of the Hebrew text. This is what the Unicode Bidi Algorithm tells it to do, and usually it is correct.  Not here though.</p>
<p>So the question is how to fix it?</p>
<h3><code>&lt;bdi></code> to the rescue</h3>
<p>The trick is to use the <span class="kw">bdi</span> element around the text to isolate it from its surrounding content. (<span class="kw">bdi</span> stands for &#8216;bidi-isolate&#8217;.)</p>
<pre>
&lt;p>&lt;bdi>Aroma&lt;/bdi> - 3 reviews&lt;/p>
&lt;p>&lt;bdi>PURPLE PIZZA&lt;/bdi> - 5 reviews&lt;/p>
</pre>
<p>The bidi algorithm now treats the Hebrew and &#8220;- 5&#8243; as separate chunks of content, and orders those chunks per the direction of the overall context, ie. from left-to-right here.</p>
<p>You&#8217;ll notice that the example above has bdi around the name Aroma too. Of course, you don&#8217;t actually need that, but it won&#8217;t do any harm. On the other hand, it means you can write a script in something like PHP that says:</p>
<pre>
foreach $restaurant echo "&lt;bdi>$restaurant['name']&lt;/bdi> - $restaurant['reviews'] reviews"; 
</pre>
<p>This means you can handle any name that comes out of the database, whatever script it is in.</p>
<p><span class="kw">bdi</span> isn&#8217;t supported fully by all browsers yet, but it&#8217;s coming.</p>
<h3>Things to avoid</h3>
<dl>
<dt>Using the <span class="kw">dir</span> attribute on a <span class="kw">span</span> element</dt>
<dd>
<p>You may think that something like this would work:</p>
<pre>
&lt;p>&lt;span dir=rtl>PURPLE PIZZA&lt;/span> - 5 reviews&lt;/p>
</pre>
<p>But actually that won&#8217;t make any difference, because it doesn&#8217;t isolate the content of the span from what surrounds it.</p>
</dd>
<dt>Using Unicode control characters</dt>
<dd>
<p>You could actually produce the desired result in this case using U+200E LEFT-TO-RIGHT MARK just before the hyphen.</p>
<pre>
&lt;p>PURPLE PIZZA &amp;lrm;- 5 reviews&lt;/p>
</pre>
<p>For a number of reasons, however, <a href="http://www.w3.org/International/questions/qa-bidi-controls">it is better to use markup</a>. Markup is part of the structure of the document, it avoids the need to add logic to the application to choose between LRM and RLM, and it doesn&#8217;t cause search failures like the Unicode characters sometimes do.  Also, the markup can neatly deal with any unbalanced embedding controls inadvertently left in the embedded text.</p>
</dd>
<dt>Using CSS</dt>
<dd>
<p>CSS has also been updated to allow you to isolate text, but you should <a href="http://www.w3.org/International/questions/qa-bidi-css-markup">always use dedicated markup for bidi rather than CSS</a>.  This means that the information about the directionality of the document is preserved even in situations where the CSS is not available.</p>
</dd>
<dt>Using <span class="kw">bdo</span></dt>
<dd>
<p>Although it sounds similar, and it&#8217;s used for bidi text too, the <span class="kw">bdo</span> element is very different. It <a href="http://www.w3.org/International/tutorials/bidi-xhtml/#Slide0420">overrides the bidi algorithm altogether</a> for the text it contains, and doesn&#8217;t isolate its contents from the surrounding text.</p>
</dd>
</dl>
<h3>Using the <span class="kw">dir</span> attribute with <span class="kw">bdi</span></h3>
<p>The <span class="kw">dir</span> attribute can be used on the <span class="kw">bdi</span> element to set the base direction. With simple strings of text like PURPLE PIZZA you don&#8217;t really need it, however if your <span class="kw">bdi</span> element contains text that is itself bidirectional you&#8217;ll want to indicate the base direction.</p>
<p>Until now, you could only set the <span class="kw">dir</span> attribute to <span class="kw">ltr</span> or <span class="kw">rtl</span>.  The problem is that in a situation such as the one described above, where you are pulling strings from a database or user, you may not know  which of these you need to use.</p>
<p>That&#8217;s why html5 has provided a new <span class="kw">auto</span> value for the <span class="kw">dir</span> attribute, and <span class="kw">bdi</span> comes with that set by default. The <span class="kw">auto</span> value tells the browser to look at the first strongly typed character in the element and work out from that what the base direction of the element should be. If it&#8217;s a Hebrew (or Arabic, etc.) character, the element will get a direction of rtl.  If it&#8217;s, say, a Latin character, the direction will be ltr.</p>
<p>There are some rare corner cases where this may not give the desired outcome, but in the vast majority of cases it should produce the expected result.</p>
<h3>Want another use case?</h3>
<p>Here&#8217;s another situation where <span class="kw">bdi</span> can be useful.  This time we are constructing multilingual breadcrumbs on the W3C i18n site. The page titles are generated by a script, and this page is in Hebrew, so the base direction is right-to-left.</p>
<p>Again here&#8217;s what you&#8217;d expect to see, and what you&#8217;d actually see.</p>
<div class="figure" style="width: 260px; float:left;text-align: center; margin-left: 10px; margin-right: 10px;">
<div class="figcaption">What it should look like.</div>
<p><img src="http://rishida.net/blog/images/breadcrumbs-right.png" title="Articles &lt; Resources &lt; WERBEH" alt="Articles &lt; Resources &lt; WERBEH"/></p>
</div>
<div class="figure" style="width: 260px; float:left;text-align: center; margin-left: 10px; margin-right: 10px;">
<div class="figcaption">What it actually looks like.</div>
<p><img src="http://rishida.net/blog/images/breadcrumbs-wrong.png" title="Resources &lt; Articles &lt; WERBEH" alt="Resources &lt; Articles &lt; WERBEH"/></p>
</div>
<p><br style="clear:both;"/></p>
<p>Whereas in the previous example we were dealing with a number that was confused about its directionality, here we are dealing with a list of same script items in a base direction of the opposite direction. </p>
<p>If you wanted to generate markup that would produce the right ordering, whatever combination of titles was thrown at it, you could wrap each title in <span class="kw">bdi</span> elements.</p>
<h3>Want more information?</h3>
<p>The inclusion of these features has been championed by Aharon Lanin of Google within the W3C Internationalization (i18n) Working Group.  He is the editor of a W3C Working Draft, <cite><a href="http://www.w3.org/International/docs/html-bidi-requirements/">Additional Requirements for Bidi in HTML</a></cite>, that tracks a range of proposals made to the HTML5 Working Group, giving rationales and recording resolutions.  (The <span class="kw">bdi</span> element started out as a suggestion to include a <a href="http://www.w3.org/International/docs/html-bidi-requirements/#bidi-isolation">ubi attribute</a>.)</p>
<p>If you like more information on handling bidi in HTML in general, try <cite><a href="http://www.w3.org/International/tutorials/bidi-xhtml/">Creating HTML Pages in Arabic, Hebrew and Other Right-to-left Scripts</a></cite></p>
<p>And <a href="http://dev.w3.org/html5/spec/text-level-semantics.html#the-bdi-element">here&#8217;s the description of <span class="kw">bdi</span></a> in the HTML5 spec.</p>
]]></content:encoded>
			<wfw:commentRss>http://rishida.net/blog/?feed=rss2&#038;p=564</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>New version of the Internationalization Checker released</title>
		<link>http://rishida.net/blog/?p=562</link>
		<comments>http://rishida.net/blog/?p=562#comments</comments>
		<pubDate>Mon, 18 Jul 2011 18:08:27 +0000</pubDate>
		<dc:creator>r12a</dc:creator>
				<category><![CDATA[general]]></category>
		<category><![CDATA[i18n]]></category>
		<category><![CDATA[utilities]]></category>
		<category><![CDATA[web]]></category>
		<category><![CDATA[checker]]></category>

		<guid isPermaLink="false">http://rishida.net/blog/?p=562</guid>
		<description><![CDATA[The &#8216;i18n checker&#8217; is a free service by the W3C that provides information about internationalization-related aspects of your HTML page, and advice on how to improve your use of markup, where needed, to support the multilingual Web. This latest release uses a new user interface and redesigned source code. It also adds a number of [...]]]></description>
				<content:encoded><![CDATA[<p style="float: right; width: 270px; margin-left: 1em; margin-bottom: 1em"><a href="http://rishida.net/blog/images/checker.png"><img src="http://rishida.net/blog/images/checker-small.png" alt="Picture of the page in action." /></a></p>
<p>The &#8216;i18n checker&#8217; is a free service by the W3C that provides information about internationalization-related aspects of your HTML page, and advice on how to improve your use of markup, where needed, to support the multilingual Web.</p>
<p>This latest release uses a new user interface and redesigned source code. It also adds a number of new tests, a file upload facility, and support for HTML5.</p>
<p>This is still a &#8216;pre-final&#8217; release and development continues. There are already plans to add further tests and features, to translate the user interface, to add support for XHTML5 and polyglot documents, to integrate with the W3C Unicorn checker, and to add various other features. At this stage we are particularly interested in receiving user feedback.</p>
<p><a href="http://validator.w3.org/i18n-checker/">Try the checker</a> and let us know if you find any bugs or have any suggestions.</p>
]]></content:encoded>
			<wfw:commentRss>http://rishida.net/blog/?feed=rss2&#038;p=562</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>New picker: Mongolian</title>
		<link>http://rishida.net/blog/?p=487</link>
		<comments>http://rishida.net/blog/?p=487#comments</comments>
		<pubDate>Sat, 09 Oct 2010 05:16:54 +0000</pubDate>
		<dc:creator>r12a</dc:creator>
				<category><![CDATA[general]]></category>
		<category><![CDATA[i18n]]></category>
		<category><![CDATA[utilities]]></category>
		<category><![CDATA[web]]></category>
		<category><![CDATA[character]]></category>
		<category><![CDATA[mongolian]]></category>
		<category><![CDATA[picker]]></category>
		<category><![CDATA[unicode]]></category>

		<guid isPermaLink="false">http://rishida.net/blog/?p=487</guid>
		<description><![CDATA[&#62;&#62; Use it This picker contains characters from the Unicode Mongolian block needed for writing the Mongolian language. It doesn&#8217;t include Sibe, Todo or Manchu characters. Mongolian is a complex script, and I am still familiarising myself with it. This is an initial trial version of a Mongolian picker, and as people use it and [...]]]></description>
				<content:encoded><![CDATA[<div>
<p style="float: right; width: 260px; margin-left: 1em; margin-bottom: 1em"><a href="http://rishida.net/blog/images/mongolianpicker.png"><img src="http://rishida.net/blog/images/mongolianpicker-small.png" alt="Picture of the page in action." /></a></p>
<p style="font-size: 150%"><a href="http://rishida.net/scripts/pickers/mongolian/">&gt;&gt; Use it</a></p>
<p>This picker contains characters from the Unicode Mongolian block needed for writing the Mongolian language. It doesn&#8217;t include Sibe, Todo or Manchu characters. Mongolian is a complex script, and I am still familiarising myself with it. This is an initial trial version of a Mongolian picker, and as people use it and raise feedback I may need to make changes.</p>
<p><strong>About the tool:</strong> Pickers allow you to quickly create phrases in a script by clicking on Unicode characters arranged in a way that aids their identification. Pickers are likely to be most useful if you don&#8217;t know a script well enough to use the native keyboard. The arrangement of characters also makes it much more usable than a regular character map utility.</p>
<p><strong>About this picker:</strong> The output area for this picker is set up for vertical text. However, only Internet Explorer currently supports vertical text display, and only IE8 supports Mongolian&#8217;s left-to-right column progression. In addition, it seems that IE doesn&#8217;t support ltr columns in textarea elements. The bottom line is that, although the output area is the right shape and position for vertical text, mostly the output will be horizontal. You will see vertical text in IE, but the column positions will look wrong. Nevertheless, in any of these cases, when you cut and paste text into another document, the characters will still be correctly ordered.</p>
<p>Consonants are to the left, and in the order listed in the Wikipedia article about Mongolian text. To their right are vowels, then punctuation, spaces and control characters, and number digits. The variation selectors are positioned just below the consonants.</p>
<p>As you mouse over the letters, the various combining forms appear in a column to the far left. This is to help identify characters, for those less familiar with the alphabet.</p>
</div>
]]></content:encoded>
			<wfw:commentRss>http://rishida.net/blog/?feed=rss2&#038;p=487</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Webfonts example pages</title>
		<link>http://rishida.net/blog/?p=464</link>
		<comments>http://rishida.net/blog/?p=464#comments</comments>
		<pubDate>Sun, 22 Aug 2010 09:45:59 +0000</pubDate>
		<dc:creator>r12a</dc:creator>
				<category><![CDATA[announcement]]></category>
		<category><![CDATA[general]]></category>
		<category><![CDATA[i18n]]></category>
		<category><![CDATA[script notes]]></category>
		<category><![CDATA[web]]></category>
		<category><![CDATA[writings]]></category>
		<category><![CDATA[myanmar]]></category>
		<category><![CDATA[urdu]]></category>
		<category><![CDATA[webfonts]]></category>
		<category><![CDATA[woff]]></category>

		<guid isPermaLink="false">http://rishida.net/blog/?p=464</guid>
		<description><![CDATA[Webfonts, and WOFF in particular, have been in the news again recently, so I thought I should mention that a few days ago I changed my pages describing Myanmar and Arabic-for-Urdu scripts so that you can download the necessary font support for the foreign text, either as a TTF linked font or as WOFF font. [...]]]></description>
				<content:encoded><![CDATA[<p>Webfonts, and WOFF in particular, have been <a href="http://www.w3.org/News/2010#entry-8877">in the news again recently</a>, so I thought I should mention that a few days ago I changed my pages describing Myanmar and Arabic-for-Urdu scripts so that you can download the necessary font support for the foreign text, either as a TTF linked font or as WOFF font.</p>
<p>You can find the Myanmar page at <a href="http://rishida.net/scripts/myanmar/">http://rishida.net/scripts/myanmar/</a>.  Look for the links n the side bar to the right, under the heading &#8220;Fonts&#8221;. </p>
<p>The Urdu page, using the beautiful Nastaliq script, is at <a href="http://rishida.net/scripts/urdu/">http://rishida.net/scripts/urdu/</a>.</p>
<p>(Note that the examples of short vowels don&#8217;t use the nastiliq style.  Scroll down the page a little further.)</p>
<p>I haven&#8217;t had time to check whether all the opentype features are correctly rendered, but I&#8217;ve been doing Mac testing of the <a href="http://www.w3.org/International/tests/tests-html-css/list-fonts">i18n webfonts tests</a>, and it looks promising. (More on that later.)  The Urdu font doesn&#8217;t rely on OS rendering, which should help.</p>
<p>Here are some examples of the text on the page:<br />
<img src="images/urdu-myanmar-woff.png" alt="Examples of Urdu script and Myanmar script." /></p>
]]></content:encoded>
			<wfw:commentRss>http://rishida.net/blog/?feed=rss2&#038;p=464</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>String analyser and conversion apps updated</title>
		<link>http://rishida.net/blog/?p=460</link>
		<comments>http://rishida.net/blog/?p=460#comments</comments>
		<pubDate>Sun, 22 Aug 2010 09:23:24 +0000</pubDate>
		<dc:creator>r12a</dc:creator>
				<category><![CDATA[general]]></category>
		<category><![CDATA[i18n]]></category>
		<category><![CDATA[utilities]]></category>
		<category><![CDATA[web]]></category>
		<category><![CDATA[converter tool]]></category>
		<category><![CDATA[string analyser]]></category>
		<category><![CDATA[unicode]]></category>

		<guid isPermaLink="false">http://rishida.net/blog/?p=460</guid>
		<description><![CDATA[Analyser: http://rishida.net/tools/analysestring/ Converter: http://rishida.net/tools/conversion/ The string analyser tool provides information about the characters in a string. One difference in this version is a new section &#8220;Data input as graphics&#8221;, where you see a horizontal sequence of graphics for each of the characters in the string you are analysing. This can be useful to get a [...]]]></description>
				<content:encoded><![CDATA[<p>Analyser: <a href="http://rishida.net/tools/analysestring/">http://rishida.net/tools/analysestring/</a></p>
<p>Converter: <a href="http://rishida.net/tools/conversion/">http://rishida.net/tools/conversion/</a></p>
<p>The <b class"noticeme">string analyser tool</b> provides information about the characters in a string. One difference in this version is a new section &#8220;Data input as graphics&#8221;, where you see a horizontal sequence of graphics for each of the characters in the string you are analysing. This can be useful to get a screen snap of the characters.  Of course, there is no combining or ligaturing behaviour involved &#8211; just a graphic per character.</p>
<p>You can reverse the character order for right-to-left scripts.</p>
<p>Another difference is that you can explode example text in the notes.  Take <a href="http://rishida.net/tools/analysestring/index.php?list=%D8%A2">this example</a>: if you click on the Arabic word for Koran (red word near the bottom of the notes), you&#8217;ll see a pop-up window in the bottom right corner of the window that lists the characters in that word.</p>
<p>The other change is that the former &#8220;Related links&#8221; section in the sidebar is now called &#8220;Do more&#8221;, and the links carry the string you are analysing to the Converter or UniView apps.</p>
<p>Oh, and the page now remembers the options you set between refreshes, which makes life much easier.</p>
<p>The <b class="noticeme">converter</b> tool converts between characters and various escaped character formats.  It was changed so that the &#8220;View names&#8221; button sends the characters to the string analyser tool.  This means that you&#8217;ll now see graphics for the characters, and that, once on the string analyser page, you can change the amount of information displayed for each character (including showing font-based characters, if you need to).</p>
<p>I also fixed a bug related to the UTF-8 and UTF-16 input.  Including spaces after the code values no longer fires off a bug.</p>
<p><i class="footnote">PS: The string analyser tool has graphics for all new Unicode 6.0 characters, however I haven&#8217;t updated the data for those characters yet.  I was planning to do so with the next release of UniView, which should be in October, when the final Unicode database is available.</i></p>
]]></content:encoded>
			<wfw:commentRss>http://rishida.net/blog/?feed=rss2&#038;p=460</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>An Introduction to Indic Scripts updated</title>
		<link>http://rishida.net/blog/?p=456</link>
		<comments>http://rishida.net/blog/?p=456#comments</comments>
		<pubDate>Mon, 16 Aug 2010 06:26:50 +0000</pubDate>
		<dc:creator>r12a</dc:creator>
				<category><![CDATA[announcement]]></category>
		<category><![CDATA[general]]></category>
		<category><![CDATA[i18n]]></category>
		<category><![CDATA[script notes]]></category>
		<category><![CDATA[web]]></category>
		<category><![CDATA[writings]]></category>
		<category><![CDATA[character]]></category>
		<category><![CDATA[indian]]></category>
		<category><![CDATA[indic]]></category>
		<category><![CDATA[scripts]]></category>

		<guid isPermaLink="false">http://rishida.net/blog/?p=456</guid>
		<description><![CDATA[http://rishida.net/scripts/indic-overview/ I finally got around to refreshing this article, by converting the Bengali, Malayalam and Oriya examples to Unicode text. Back when I first wrote the article, it was hard to find fonts for those scripts. I also added a new feature: In the HTML version, click on any of the examples in indic text [...]]]></description>
				<content:encoded><![CDATA[<p><a href="http://rishida.net/scripts/indic-overview/">http://rishida.net/scripts/indic-overview/</a></p>
<p>I finally got around to refreshing this article, by converting the Bengali, Malayalam and Oriya examples to Unicode text. Back when I first wrote the article, it was hard to find fonts for those scripts.</p>
<p>I also added a new feature: In the HTML version, click on any of the examples in indic text and a pop-up appears at the bottom right of the page, showing which characters the example is composed of.  The pop-up lists the characters in order, with Unicode names, and shows the characters themselves as graphics.</p>
<p>I have not yet updated this article&#8217;s incarnation as <a href="http://www.unicode.org/notes/tn10/">Unicode Technical Note #10</a>.  The Indian Government also used this article, and made a number of small changes.  I have yet to incorporate those, too.</p>
]]></content:encoded>
			<wfw:commentRss>http://rishida.net/blog/?feed=rss2&#038;p=456</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>What does it take to display non-Latin HTML pages?</title>
		<link>http://rishida.net/blog/?p=445</link>
		<comments>http://rishida.net/blog/?p=445#comments</comments>
		<pubDate>Wed, 04 Aug 2010 18:14:49 +0000</pubDate>
		<dc:creator>r12a</dc:creator>
				<category><![CDATA[general]]></category>
		<category><![CDATA[i18n]]></category>
		<category><![CDATA[web]]></category>
		<category><![CDATA[writings]]></category>
		<category><![CDATA[character encoding]]></category>
		<category><![CDATA[fonts]]></category>
		<category><![CDATA[unicode]]></category>

		<guid isPermaLink="false">http://rishida.net/blog/?p=445</guid>
		<description><![CDATA[I recently came across an email thread where people were trying to understand why they couldn&#8217;t see Indian content on their mobile phones. Here are some notes that may help to clarify the situation. They are not fully developed! Just rough jottings, but they may be of use. Let&#8217;s assume, for the sake of an [...]]]></description>
				<content:encoded><![CDATA[<p>I recently came across an email thread where people were trying to understand why they couldn&#8217;t see Indian content on their mobile phones.  Here are some notes that may help to clarify the situation.  They are not fully developed! Just rough jottings, but they may be of use.</p>
<p>Let&#8217;s assume, for the sake of an example, that the goal is to display a page in Hindi, which is written using the devanagari script. These principles, however, apply to one degree or another to all languages that use characters outside the ASCII range.</p>
<p>Let&#8217;s start by reviewing some fundamental concepts: character encodings and fonts.  If you are familiar with these concepts, skip to the next heading.</p>
<h4>Character encodings and fonts</h4>
<p>Content is composed of a sequence of <em>characters</em>. Characters represent letters of the alphabet, punctuation, etc. But content is stored in a computer as a sequence of bytes, which are numeric values. Sometimes more than one byte is used to represent a single character. Like codes used in espionage, the way that the sequence of bytes is converted to characters depends on what <em>key</em> was used to encode the text. In this context, that key is called a <em>character encoding</em>.</p>
<p>There are many character encodings to choose from.</p>
<p>The person who created the content of the page you want to read should have used a character encoding that supports devanagari characters, but it should also be a character encoding that is widely recognised by browsers and available in editors. By far the best character encoding to use (for any language in the world) is called UTF-8.  </p>
<p>UTF-8 is strongly recommended by the HTML5 draft specification.</p>
<p>There should be a <em>character encoding declaration</em> associated with the HTML code of your page to say what encoding was used.  Otherwise the browser may not interpret the bytes correctly. It is also crucial that the text is actually <em>stored</em> in that encoding too.  That means that the person creating the content must choose that encoding when they save the page from their editor.  It&#8217;s not possible to change the encoding of text simply by changing the character encoding declaration in the HTML code, because the declaration is there just to indicate to the browser what key to use to get at the already encoded text.</p>
<p>It&#8217;s one thing for the browser to know how to interpret the bytes to represent your text, but the browser must also have a way to make those characters stored in memory appear on the screen.</p>
<p>A font is essential here. Fonts contain instructions for displaying a character or a sequence of characters so that you can read them.  The visual representation of a character is called a glyph. The font converts characters to glyphs. </p>
<p>The font has tables to map the bytes in memory to text. To do this, the font needs to recognise the character encoding your page uses, and have the necessary tables to convert the characters to glyphs.  It is important that the font used can work with the character encoding used in the page you want to view.  Most fonts these days support UTF-8 encoded text.</p>
<p>Very simple fonts contain one glyph for each letter of the alphabet.  This may work for English, but it wouldn&#8217;t work for a complex script such as devanagari.  In these scripts the positioning and interaction of characters has to be modified according to the context in which they are displayed.  This means that the font needs additional information about how to choose and postion glyphs depending on the context.  That information may be built into the font itself, or the font may rely on information on your system.</p>
<h4>Character encoding support</h4>
<p>The browser needs to be able to recognise the character encoding used in order to correctly interpret the mapping between bytes and characters.</p>
<p>If the character encoding of the page is incorrectly declared, or not declared at all, there will be problems viewing the content. Typically, a browser allows the user to manually apply a particular encoding by selecting the encoding from the menu bar.</p>
<p>All browsers should support the UTF-8 character encoding.  </p>
<p>Sometimes people use an encoding that is not designed for devanagari support with a font that produces the right glyphs nevertheless.  Such approaches are fraught with issues and present poor interoperability on several levels. For example, the content can only be interpreted correctly by applying the specifically designed font; no other font will do if that font is not available. Also, the meaning of the text cannot be derived by machine processing, for web searches, etc., and the data cannot be easily copied or merged with other text (eg. to quote a sentence in another article that doesn&#8217;t use the same encoding). This practise seriously damages the openness of the Web and should be avoided at all costs.</p>
<h4>System font support</h4>
<p>Usually, a web page will rely on the operating system to provide a devanagari font.  If there isn&#8217;t one, users won&#8217;t be able to see the Hindi text.  The browser doesn&#8217;t supply the font, it picks it up from whatever platform the browser is running on.</p>
<p>If browser is running on a desktop computer, there may be a font already installed. If not, it should be possible to download free or commercial fonts and install them. If the user is viewing the page on a mobile device, it may currently be difficult to download and install one.</p>
<p>If there are several devanagari fonts on a system, the browser will usually pick one by default.  However, if the web page uses CSS to apply styling to the page, the CSS code may specify one or more particular fonts to use for a given piece of content. If none of these are available on the system, most browsers will fall back to the default, however Internet Explorer will show square boxes instead.</p>
<h4>Webfonts</h4>
<p>Another way of getting a font onto the user&#8217;s system is to download it with the page, just like images are downloaded with the page.  This is done using CSS code. The CSS code to do this has been defined for some years, but unfortunately most browsers  implementation of this feature is still problematic.</p>
<p>Recently a number of major browsers have begun to support download of raw truetype or opentype fonts. Internet Explorer is not one of those. This involves simply loading the ordinary font onto a server and downloading to the browser when the page is displayed. Although the font may be cached as the user moves from page to page, there may still be some significant issues when dealing with complex scripts or Far Eastern languages (such as Chinese, Japanese and Korean) due to the size of the fonts used. The size of these fonts can often be counted in megabytes rather than kilobytes.</p>
<p>It is important to observe licencing restrictions when making fonts available for download in this way. The CSS mechanism doesn&#8217;t contain any restrictions related to font licences, but there are ways of preparing fonts for download that take into consideration some aspects of this issue &#8211; though not enough to provide a watertight restriction on font usage. </p>
<p>Microsoft makes available a program to create .eot fonts from ordinary true/opentype fonts. Eot font files can apply some usage restrictions and also subset the font to include only the characters used on the page.  The subsetting feature is useful when only a small amount of text appears in a given font, but for a whole page in, say, devanagari script it is of little use &#8211; particularly if the user is to input text in forms.  The biggest problem with .eot files, however, is that they are only supported by Internet Explorer, and there are no plans to support .eot format on other browsers.</p>
<p>The W3C is currently working on the WOFF format. Fonts converted to WOFF format can have some gentle protection with regard to use, and also apply significant compression to the font being downloaded.  WOFF is currently only supported by Firefox, but all other major browsers are expected to provide support for the new format.</p>
<p>For this to work well, all browsers must support the same type of font download.</p>
<h4>Beyond fonts</h4>
<p>Complex scripts, such as those used for Indic and South East Asian languages, need to choose glyph shapes and positions and substitute ligatures, etc. according to the context in which characters are used. These adjustments can be acoomplished using the features of OpenType fonts.  The browser must be able to implement those opentype features.</p>
<p>Often a font will also rely on operating system support for some subset of the complex script rendering.  For example, a devanagari font may rely on the Windows uniscribe dll for things like positioning of left-appended vowel signs, rather than encoding that behaviour into the font itself.  This reduces the size and complexity of the font, but exposes a problem when using that font on a variety of platforms.  Unless the operating system can provide the same rendering support, the text will look only partially correct. Mobile devices must either provide something similar to uniscribe, or fonts used on the mobile device must include all needed rendering features.</p>
<p>Browsers that do font linking must also support the necessary opentype features and obtain functionality from the OS rendering support where needed.</p>
<p>If tools are developed to subset webfonts, the subsetting must not remove the rendering logic needed for correct display of the text.</p>
]]></content:encoded>
			<wfw:commentRss>http://rishida.net/blog/?feed=rss2&#038;p=445</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Spell checking and language tags in XMetal: revisited.</title>
		<link>http://rishida.net/blog/?p=330</link>
		<comments>http://rishida.net/blog/?p=330#comments</comments>
		<pubDate>Tue, 15 Dec 2009 17:22:09 +0000</pubDate>
		<dc:creator>r12a</dc:creator>
				<category><![CDATA[general]]></category>
		<category><![CDATA[i18n]]></category>
		<category><![CDATA[web]]></category>

		<guid isPermaLink="false">http://rishida.net/blog/?p=330</guid>
		<description><![CDATA[I just received email from Derek Reid in the XMetal team at JustSystems to say that they have significantly improved the way the XMetal XML editor uses xml:lang attributes in the source code in conjunction with its spell-checker. Basically, XMetal will switch spell-checking dictionaries based on the xml:lang settings in the markup. It also supports [...]]]></description>
				<content:encoded><![CDATA[<p>I just received email from Derek Reid in the XMetal team at JustSystems to say that they have significantly improved the way the XMetal XML editor uses xml:lang attributes in the source code in conjunction with its spell-checker.  </p>
<p>Basically, XMetal will switch spell-checking dictionaries based on the xml:lang settings in the markup. It also supports xml:lang=&#8221;" and xml:lang=&#8221;zxx&#8221; for places you don&#8217;t want to spell-check. It even does this when using interactive red squiggles to highlight potential misspellings.</p>
<p>I wrote a <a href="http://rishida.net/blog/?p=83">blog post</a> about this in 2007, when the capability was only partially developed. Derek says:</p>
<blockquote><p>I read this post when you first wrote it, and after getting feedback from a large number of our clients I was finally able to convince our development and management teams to properly support language auto-switching for spell checking in conjunction with xml:lang attribute values in our product.</p>
<p>We made a big effort to deal with these limitations during the past year and our XMetaL Author Enterprise 6.0 release addresses most or all of them.</p>
<p>If you are interested, I have posted instructions on how to configure XMetaL Author Enterprise 6.0 to properly support this feature:<br />
<a href="http://forums.xmetal.com/index.php/topic,539.msg1701">http://forums.xmetal.com/index.php/topic,539.msg1701</a></p>
</blockquote>
<p>I haven&#8217;t had a chance to try it out yet, but it sounds exciting.  Now how about DreamWeaver&#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://rishida.net/blog/?feed=rss2&#038;p=330</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Language Subtag Lookup tool updated</title>
		<link>http://rishida.net/blog/?p=328</link>
		<comments>http://rishida.net/blog/?p=328#comments</comments>
		<pubDate>Thu, 10 Dec 2009 11:56:40 +0000</pubDate>
		<dc:creator>r12a</dc:creator>
				<category><![CDATA[general]]></category>
		<category><![CDATA[i18n]]></category>
		<category><![CDATA[utilities]]></category>
		<category><![CDATA[web]]></category>

		<guid isPermaLink="false">http://rishida.net/blog/?p=328</guid>
		<description><![CDATA[&#62;&#62; Use it About the tool: BCP 47 language tags are built from subtags in the IANA Subtag Registry. This tool helps you find or look up subtags and check for errors in language tags. It also provides information to guide your choices. Latest changes: I reworked the informational text that accompanies macrolanguages, their encompassed [...]]]></description>
				<content:encoded><![CDATA[<p style="font-size: 150%"><a href="http://rishida.net/utils/subtags/">&gt;&gt; Use it</a></p>
<p style="float: right; width: 260px; margin-left: 1em; margin-bottom: 1em"><a href="http://rishida.net/blog/images/subtags-0912.png"><img src="http://rishida.net/blog/images/subtags-0912-small.png" alt="Picture of the page in action." /></a></p>
<p><strong>About the tool:</strong> BCP 47 language tags are built from subtags in the IANA Subtag Registry. This tool helps you find or look up subtags and check for errors in language tags. It also provides information to guide your choices.</p>
<p><strong>Latest changes:</strong> I reworked the informational text that accompanies macrolanguages, their encompassed languages, and extlang subtags.  As part of that, I changed the code to allow for highlighting of specific cases. For example, where legacy may dictate that the macrolanguage subtag (zh) is more useful for Mandarin Chinese than the more specific tags (cmn or zh-cmn).</p>
<p>I simplified the intro to the page, but added a link to the new article <a href="http://www.w3.org/International/questions/qa-choosing-language-tags">Choosing a Language Tag</a>, which provides useful step-by-step guidelines on creating language tags.</p>
<p>I also changed the user interface somewhat. The input fields are easier to work with and take up less vertical space.  Also, you can now submit a query by simply hitting return after typing into a field.  I had originally required you to click on a submit button so that all values in other fields would be retained when the answer is shown &#8211; this was so that while checking various subtags you could build up a language tag in the Check field for later checking.  I just found that the annoyance of continually having to resubmit after forgetting to click on the submit button wasn&#8217;t worth the extra functionality (and I was also encouraged to do so by feedback from Bert Bos).</p>
]]></content:encoded>
			<wfw:commentRss>http://rishida.net/blog/?feed=rss2&#038;p=328</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
