<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>&#62;&#62; blog</title>
	<atom:link href="http://rishida.net/blog/?feed=rss2" rel="self" type="application/rss+xml" />
	<link>http://rishida.net/blog</link>
	<description>News of changes to my main site, and W3C related posts.</description>
	<lastBuildDate>Thu, 22 Oct 2009 15:01:27 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.5</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Conversion tool updated to 7.0.1</title>
		<link>http://rishida.net/blog/?p=313</link>
		<comments>http://rishida.net/blog/?p=313#comments</comments>
		<pubDate>Mon, 14 Sep 2009 13:07:45 +0000</pubDate>
		<dc:creator>r12a</dc:creator>
				<category><![CDATA[general]]></category>
		<category><![CDATA[i18n]]></category>
		<category><![CDATA[utilities]]></category>
		<category><![CDATA[web]]></category>

		<guid isPermaLink="false">http://rishida.net/blog/?p=313</guid>
		<description><![CDATA[Removed the &#8216;beta&#8217; from the version number and replaced with .0.1.  New version converts u+&#8230; (ie. lowercase u) as well as U+&#8230; now.
See http://rishida.net/tools/conversion/
Thanks to Martin Dürst for the suggestion.
]]></description>
			<content:encoded><![CDATA[<p>Removed the &#8216;beta&#8217; from the version number and replaced with .0.1.  New version converts u+&#8230; (ie. lowercase u) as well as U+&#8230; now.</p>
<p>See <a href="http://rishida.net/tools/conversion/">http://rishida.net/tools/conversion/</a></p>
<p>Thanks to Martin Dürst for the suggestion.</p>
]]></content:encoded>
			<wfw:commentRss>http://rishida.net/blog/?feed=rss2&amp;p=313</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Further developments to Language Tag Lookup tool</title>
		<link>http://rishida.net/blog/?p=304</link>
		<comments>http://rishida.net/blog/?p=304#comments</comments>
		<pubDate>Wed, 19 Aug 2009 14:45:38 +0000</pubDate>
		<dc:creator>r12a</dc:creator>
				<category><![CDATA[general]]></category>
		<category><![CDATA[i18n]]></category>
		<category><![CDATA[utilities]]></category>
		<category><![CDATA[web]]></category>

		<guid isPermaLink="false">http://rishida.net/blog/?p=304</guid>
		<description><![CDATA[&#62;&#62; Use it

I have added a bunch of additional new features to my lookup tool to help with choosing language tags. There is additional information available when you look up subtags (such as what to use if the subtag is deprecated, and what subtags macrolanguages enclose, etc.), and more tests on well-formedness with clearer explanations [...]]]></description>
			<content:encoded><![CDATA[<p style="font-size: 150%"><a href="http://rishida.net/utils/subtags/">&gt;&gt; Use it</a></p>
<p style="float: right; width: 260px; margin-left: 1em; margin-bottom: 1em"><a href="http://rishida.net/blog/images/subtags-0908b.png"><img src="http://rishida.net/blog/images/subtags-0908b-small.png" alt="Picture of the page in action." /></a></p>
<p>I have added a bunch of additional new features to my lookup tool to help with choosing language tags. There is additional information available when you look up subtags (such as what to use if the subtag is deprecated, and what subtags macrolanguages enclose, etc.), and more tests on well-formedness with clearer explanations of the problem.  <a href="http://rishida.net/utils/subtags/index.php?lookup=mg+valencia+avst+in&#038;submit=Look+up">Example</a>.</p>
<p>This should make it a lot more useful to people who haven&#8217;t read BCP 47 and want to create language tags.  Hopefully, in a short while, I&#8217;ll also write and link to an article that describes how to use subtags from the ground up in a procedural way, that will complement the tool.</p>
<p>For further assistance, you can now link from a language subtag result to the SIL Ethnologue, to make it easier to check whether that subtag really does refer to the language you were thinking of.</p>
<p>In addition, script subtag results link to Unicode blocks in UniView.</p>
]]></content:encoded>
			<wfw:commentRss>http://rishida.net/blog/?feed=rss2&amp;p=304</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Language tag lookup tool: bigger and better</title>
		<link>http://rishida.net/blog/?p=299</link>
		<comments>http://rishida.net/blog/?p=299#comments</comments>
		<pubDate>Thu, 13 Aug 2009 16:22:07 +0000</pubDate>
		<dc:creator>r12a</dc:creator>
				<category><![CDATA[general]]></category>
		<category><![CDATA[i18n]]></category>
		<category><![CDATA[utilities]]></category>
		<category><![CDATA[web]]></category>

		<guid isPermaLink="false">http://rishida.net/blog/?p=299</guid>
		<description><![CDATA[&#62;&#62; Use it

The IANA Subtag Registry has been recently updated to contain 220 extlang subtags and the ISO 639-3 language subtags, taking the total number of subtags to almost 8,000. 
I have produced a new version of my lookup tool to help with language tagging. In addition to helping you find subtags and lookup the [...]]]></description>
			<content:encoded><![CDATA[<p style="font-size: 150%"><a href="http://rishida.net/utils/subtags/">&gt;&gt; Use it</a></p>
<p style="float: right; width: 260px; margin-left: 1em; margin-bottom: 1em"><a href="http://rishida.net/blog/images/subtags-0908.png"><img src="http://rishida.net/blog/images/subtags-0908-small.png" alt="Picture of the page in action." /></a></p>
<p>The IANA Subtag Registry has been recently updated to contain 220 extlang subtags and the ISO 639-3 language subtags, taking the total number of subtags to almost 8,000. </p>
<p>I have produced a new version of my lookup tool to help with language tagging. In addition to helping you find subtags and lookup the meaning of subtags, it now helps check the well-formedness of a language tag.</p>
<p>The tool provides access to all currently defined subtags, including the new extlang subtags.</p>
<p><strong>Parsing language tags.</strong> In addition to trying to make the user interface more friendly, I also added the ability to parse hyphenated tags and discover their structure and check for errors.  I&#8217;m not claiming with this release that the new parser field tests all the corner cases, but it should provide reports for most of the typical errors.</p>
<p>It reports errors for the following:</p>
<p>- subtags that are not in the registry (by type)<br />
- incorrectly ordered subtags<br />
- duplicate variant tags and multiple tags of other types<br />
- overlong private use subtags</p>
<p>Try <a href="http://rishida.net/utils/subtags/index.php?check=de-419-DE-alt&#038;submit=Check">this example</a>. </p>
<p>It doesn&#8217;t yet handle extensions, but then there aren&#8217;t any valid ones to handle yet anyway.</p>
<p>I hope that&#8217;s useful.</p>
]]></content:encoded>
			<wfw:commentRss>http://rishida.net/blog/?feed=rss2&amp;p=299</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>UniView 5.2(beta)a: Graphics as default</title>
		<link>http://rishida.net/blog/?p=293</link>
		<comments>http://rishida.net/blog/?p=293#comments</comments>
		<pubDate>Fri, 31 Jul 2009 06:33:19 +0000</pubDate>
		<dc:creator>r12a</dc:creator>
				<category><![CDATA[general]]></category>
		<category><![CDATA[i18n]]></category>
		<category><![CDATA[utilities]]></category>
		<category><![CDATA[web]]></category>

		<guid isPermaLink="false">http://rishida.net/blog/?p=293</guid>
		<description><![CDATA[&#62;&#62; See what it can do
&#62;&#62; Use it

Following hot on the heels of the last release come some further significant changes to UniView aimed at making it easier to use as Unicode grows.
The big change is that UniView now starts up in graphics mode by default. This means that pages load more slowly, but (especially [...]]]></description>
			<content:encoded><![CDATA[<p style="font-size: 110%"><a href="http://rishida.net/scripts/uniview/help">&gt;&gt; See what it can do</a></p>
<p style="font-size: 150%"><a href="http://rishida.net/scripts/uniview/">&gt;&gt; Use it</a></p>
<p style="float: right; width: 260px; margin-left: 1em; margin-bottom: 1em"><a href="http://rishida.net/blog/images/uniview52betaa.png"><img src="http://rishida.net/blog/images/uniview52betaa-small.png" alt="Picture of the page in action." /></a></p>
<p>Following hot on the heels of the last release come some further significant changes to UniView aimed at making it easier to use as Unicode grows.</p>
<p>The big change is that UniView now starts up in graphics mode by default. This means that pages load more slowly, but (especially with the continuing growth of Unicode) also means that you are more likely to be able to see the characters you are looking for. It&#8217;s easy to switch between modes at any point, using the &#8220;<span style="color:brown">Use graphics</span>&#8221; checkbox.  (And if you preferred font glyphs as a default, you just need to <a href="http://rishida.net/scripts/uniview/help.html#defaults">change the URI</a> in your bookmarked link slightly, and you can continue to work that way.)</p>
<p>To facilitate this change, I created my own graphics for a number of blocks which are not yet covered by decodeunicode, or which are no longer fully covered by decodeunicode. The blocks for which I provided graphics are <strong>Latin Extended-C, Latin Extended-D, Latin Extended Additional, Cyrillic Supplement, Cyrillic Extended-B, Modifier Tone Letters, Tibetan, Malayalam, Saurashtra, Ol Chiki, Myanmar, Kayah Li, Cham, Rejang, Vai, Supplemental Punctuation</strong>, and <strong>Miscellaneous Symbols and Arrows.</strong></p>
<p>There are still many characters for which there are no graphics (especially the new characters in Unicode 5.2), but coverage is much better than it was. As I find more fonts, I will be able to create graphics for the remaining characters.</p>
<p>I also put a grey box around the characters in tables. This is particularly useful if there are no graphics or font glyphs for a block or range of characters, as it makes it easier to locate the character you are looking for.</p>
<p>I also fixed a bug that was preventing Chrome and Safari and IE from displaying the first two Latin blocks. I think the bug was actually in the Unicode data file.</p>
]]></content:encoded>
			<wfw:commentRss>http://rishida.net/blog/?feed=rss2&amp;p=293</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>UniView 5.2beta available</title>
		<link>http://rishida.net/blog/?p=289</link>
		<comments>http://rishida.net/blog/?p=289#comments</comments>
		<pubDate>Mon, 27 Jul 2009 12:46:19 +0000</pubDate>
		<dc:creator>r12a</dc:creator>
				<category><![CDATA[general]]></category>
		<category><![CDATA[i18n]]></category>
		<category><![CDATA[utilities]]></category>
		<category><![CDATA[web]]></category>

		<guid isPermaLink="false">http://rishida.net/blog/?p=289</guid>
		<description><![CDATA[&#62;&#62; See what it can do
&#62;&#62; Use it

With the family now in Japan, I had some extra time to spare this weekend, so I upgraded UniView to handle all the proposed characters for Unicode 5.2.  
While the properties for new and modified characters are still in beta they are not officially stable, however the [...]]]></description>
			<content:encoded><![CDATA[<p style="font-size: 110%"><a href="http://rishida.net/scripts/uniview/help">&gt;&gt; See what it can do</a></p>
<p style="font-size: 150%"><a href="http://rishida.net/scripts/uniview/">&gt;&gt; Use it</a></p>
<p style="float: right; width: 260px; margin-left: 1em; margin-bottom: 1em"><a href="http://rishida.net/blog/images/uniview52beta.png"><img src="http://rishida.net/blog/images/uniview52beta-small.png" alt="Picture of the page in action." /></a></p>
<p>With the family now in Japan, I had some extra time to spare this weekend, so I upgraded UniView to handle all the proposed characters for Unicode 5.2.  </p>
<p>While the properties for new and modified characters are still in beta they are not officially stable, however the character allocations should be stable at this point. UniView therefore alerts you if you are looking at a new character. </p>
<p>If the Unicode database information has changed for a given character you are also warned, and provided with a link that points to the previous information for that character. These warnings will be removed from UniView when Unicode 5.2 is released.</p>
<p>Of course, you are unlikely to be able to actually see the new characters themselves, unless you are lucky enough to have a very new font to hand.  The graphic alternatives are not available yet for these characters.  I&#8217;m wondering whether it&#8217;s possible for me to do something about that, but that will take a little longer. In the meantime, you might find it more useful to view blocks in list view. (Click on &#8216;Show range as list&#8217;).</p>
<p>This release also fixes a few small bugs in the HTML and JavaScript code.</p>
]]></content:encoded>
			<wfw:commentRss>http://rishida.net/blog/?feed=rss2&amp;p=289</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Berlin &amp; Potsdam photos</title>
		<link>http://rishida.net/blog/?p=284</link>
		<comments>http://rishida.net/blog/?p=284#comments</comments>
		<pubDate>Sat, 11 Jul 2009 05:21:32 +0000</pubDate>
		<dc:creator>r12a</dc:creator>
				<category><![CDATA[general]]></category>
		<category><![CDATA[photos]]></category>

		<guid isPermaLink="false">http://rishida.net/blog/?p=284</guid>
		<description><![CDATA[
 
 
  Berliner Dom, from the River Spree
  Flickr photostream.
 
 

I was in Berlin for Localization World and then Potsdam to talk about Japanese Layout last month.  
I didn&#8217;t get much time for photos in Berlin.  These photos were mostly taken during the dinner cruise. And in Potsdam it [...]]]></description>
			<content:encoded><![CDATA[<div style="float: left; margin-right: 20px; margin-bottom: 15px;">
<p> <a href="http://www.flickr.com/photos/ishida/1839598224/" title="photo sharing"><img src="http://farm4.static.flickr.com/3347/3641589309_9a44bbcbc4_m.jpg" alt="" style="border: solid 2px #000000;" /></a><br />
 <span style="font-size: 0.9em; margin-top: 0px;"><br />
  <a href="http://www.flickr.com/photos/ishida/3641589309/">Berliner Dom, from the River Spree</a><br />
  <a href="http://www.flickr.com/photos/ishida/">Flickr photostream</a>.<br />
 </span>
 </p>
</div>
<p>I was in Berlin for Localization World and then Potsdam to talk about Japanese Layout last month.  </p>
<p>I didn&#8217;t get much time for photos in Berlin.  These photos were mostly taken during the dinner cruise. And in Potsdam it poured with rain most of the day, so the photo look a little dark.</p>
<p>I also uploaded a bunch of photos from a trip to Berlin with the family in 2005.</p>
<p><br clear="all" /></p>
<p>There are 4 new sets of photos:</p>
<ul>
<li><a href="http://rishida.net/photos/sets/thumbs.php?set=sel-berlin">Berlin selection</a> Some of the better photos from Berlin from both trips.</li>
<li><a href="http://rishida.net/photos/sets/thumbs.php?set=sel-potsdam">Potsdam selection</a> Ditto for Potsdam.</li>
<li><a href="http://rishida.net/photos/sets/thumbs.php?set=0906-potsdam">Potsdam &#038; Berlin, 2009</a> Trip record.</li>
<li><a href="http://rishida.net/photos/sets/thumbs.php?set=0504-berlin">Berlin, 2005</a> Trip record.</li>
</ul>
<div class="post-info"></div>
]]></content:encoded>
			<wfw:commentRss>http://rishida.net/blog/?feed=rss2&amp;p=284</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Converter tool updated and moved</title>
		<link>http://rishida.net/blog/?p=274</link>
		<comments>http://rishida.net/blog/?p=274#comments</comments>
		<pubDate>Wed, 01 Jul 2009 09:06:20 +0000</pubDate>
		<dc:creator>r12a</dc:creator>
				<category><![CDATA[general]]></category>
		<category><![CDATA[i18n]]></category>
		<category><![CDATA[utilities]]></category>
		<category><![CDATA[web]]></category>

		<guid isPermaLink="false">http://rishida.net/blog/?p=274</guid>
		<description><![CDATA[
&#62;&#62; Try it


A new version of this very popular tool is now available, in a new location. Although it is currently labeled &#8216;beta&#8217;, I recommend that you use that instead, and change any links and bookmarks to the new location. There are a number of new features. 
There is also a vastly improved code base. [...]]]></description>
			<content:encoded><![CDATA[<div style="float: right; width: 270px; margin-left: 1em; margin-bottom: 1em">
<p style="font-size: 150%"><a href="http://rishida.net/tools/conversion/">&gt;&gt; Try it</a></p>
<p><a href="http://rishida.net/blog/images/converter7beta.png"><img src="http://rishida.net/blog/images/converter7beta-small.png" alt="Picture of the page in action." /></a></p>
</div>
<p>A new version of this very popular tool is now available, in a new location. Although it is currently labeled &#8216;beta&#8217;, I recommend that you use that instead, and change any links and bookmarks to the new location. There are a number of new features. </p>
<p>There is also a vastly improved code base.  If you are one of the many people who have contacted me to ask how I coded the conversions, please take a look at <a href="http://rishida.net/tools/conversion/conversionfunctions.js">the new javascript code</a>.  It is much cleaner and more compact.</p>
<p>New features include:</p>
<p>    *  New mixed input field and position of some fields changed.<br />
    * New field for conversion of 0x&#8230; notation hex escapes.<br />
    * Enabled invisible and ambiguous characters to be made visible in the XML output.<br />
    * Added support for all HTML entities in HTML/XML input.<br />
    * All code rewritten to use characters as the internal representation, rather than code points. Also, code is much smaller and cleaner, partly through use of regular expression matching.<br />
    * Various filters available for conversion, such as allowing ASCII or Latin1 characters to remain unconverted in NCR output.<br />
    * New icon to quickly select all contents of a field.</p>
<p>There is also a new demonstration feature.</p>
<p>If there are no issues raised/remaining in a couple of months, I&#8217;ll remove the beta tag.</p>
]]></content:encoded>
			<wfw:commentRss>http://rishida.net/blog/?feed=rss2&amp;p=274</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>New tool: Territory finder</title>
		<link>http://rishida.net/blog/?p=254</link>
		<comments>http://rishida.net/blog/?p=254#comments</comments>
		<pubDate>Sun, 22 Mar 2009 13:02:01 +0000</pubDate>
		<dc:creator>r12a</dc:creator>
				<category><![CDATA[general]]></category>
		<category><![CDATA[utilities]]></category>
		<category><![CDATA[web]]></category>

		<guid isPermaLink="false">http://rishida.net/blog/?p=254</guid>
		<description><![CDATA[&#62;&#62; Use it 

This is a new tool that helps you to locate a country or territory on a map of the world.  Ever wondered where Khazakhstan is?  This will show you.
The map is in SVG and expands to fill the window. Territories are coloured red. Very small territories are marked by a [...]]]></description>
			<content:encoded><![CDATA[<p style="font-size: 150%"><a href="http://rishida.net/tools/territories/">&gt;&gt; Use it </a></p>
<p style="float: right; width: 400px; margin-left: 1em; margin-bottom: 1em"><img src="http://rishida.net/blog/images/territories.png" alt="Picture of the page in action." /></p>
<p>This is a new tool that helps you to locate a country or territory on a map of the world.  Ever wondered where Khazakhstan is?  This will show you.</p>
<p>The map is in SVG and expands to fill the window. Territories are coloured red. Very small territories are marked by a red dot.</p>
<p>The map comes from Wikipedia. The list of territories comes from the regions listed in the IANA Language Subtag Registry. I can&#8217;t guarrantee that all the territories in the pulldown list are viewable, but nearly all are.</p>
<p>It&#8217;s quite a big SVG file, so it takes a little while to draw.  I&#8217;ll try to speed that up in the future. It seems to draw much faster on Chrome or Opera than on Firefox or IE.</p>
<p>For the future I have some other ideas, such as displaying the country name natively, and linking to Wikipedia articles, CLDR data, etc.  But that&#8217;s for later.</p>
<p style="color:brown;">Update: Almost every time I located a country, I found myself wondering what the countries alongside are. So now as you move your mouse over a country, the name of that country pops up.</p>
<p>Enjoy.</p>
]]></content:encoded>
			<wfw:commentRss>http://rishida.net/blog/?feed=rss2&amp;p=254</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>UniView 5.1.0d: Normalisation, New interface, Decimal code points, etc</title>
		<link>http://rishida.net/blog/?p=242</link>
		<comments>http://rishida.net/blog/?p=242#comments</comments>
		<pubDate>Sat, 14 Feb 2009 07:02:21 +0000</pubDate>
		<dc:creator>r12a</dc:creator>
				<category><![CDATA[general]]></category>
		<category><![CDATA[i18n]]></category>
		<category><![CDATA[utilities]]></category>
		<category><![CDATA[web]]></category>

		<guid isPermaLink="false">http://rishida.net/blog/?p=242</guid>
		<description><![CDATA[&#62;&#62; See what it can do !
&#62;&#62; Use it !

The major changes in this version include a new feature to normalise text as NFC or NFD, the ability to accept decimal code point values, and an overhaul of top part of the user interface.
Added buttons to the Text area to allow conversion of the text [...]]]></description>
			<content:encoded><![CDATA[<p style="font-size: 110%"><a href="http://rishida.net/scripts/uniview/help">&gt;&gt; See what it can do !</a></p>
<p style="font-size: 150%"><a href="http://rishida.net/scripts/uniview/">&gt;&gt; Use it !</a></p>
<p style="float: right; width: 260px; margin-left: 1em; margin-bottom: 1em"><a href="http://rishida.net/blog/images/uniview510d.png"><img src="http://rishida.net/blog/images/uniview510d-small.png" alt="Picture of the page in action." /></a></p>
<p>The major changes in this version include a new feature to normalise text as NFC or NFD, the ability to accept decimal code point values, and an overhaul of top part of the user interface.</p>
<p>Added buttons to the <span class="onscreen">Text area</span> to allow conversion of the text to NFC or NFD normalization forms. (You may not notice the change until you list the characters.)</p>
<p>The control panel was also substantially rearranged again to hopefully make it easier for newcomers to see what they can do.</p>
<p>The <span class="onscreen">Code point</span> conversion feature was upgraded to handle decimal code point values.</p>
<p>A single character in the codepoints area or text area is now listed in the lower left panel when you click on <img src="/rishida/scripts/uniview/images/apply.gif" alt=" " align="bottom" border="0"/>, rather than in the right-hand properties panel.  This is to improve consistency and avoid surprises.</p>
<p>Added a link to the CLDR property demo from the right panel to give access to additional properties.</p>
<p>Improved the parsing of codepoints when surrounded by text in the <span class="onscreen">Code point</span> input field, so that it now works with &amp;#x&#8230;;  and \u&#8230; and \U&#8230; escapes.</p>
<p>Jettisoned some unneeded code to reduce download by around 40-50K bytes. Implemented the NFC/NFD feature using AJAX, to avoid putting the download size back up.</p>
<p>When you delete the contents of the text area or the code point area, the associated input field is given focus, so you are ready for input.</p>
<p>A couple more minor bug fixes.</p>
]]></content:encoded>
			<wfw:commentRss>http://rishida.net/blog/?feed=rss2&amp;p=242</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Normalization code</title>
		<link>http://rishida.net/blog/?p=222</link>
		<comments>http://rishida.net/blog/?p=222#comments</comments>
		<pubDate>Wed, 04 Feb 2009 07:52:12 +0000</pubDate>
		<dc:creator>r12a</dc:creator>
				<category><![CDATA[code notes]]></category>
		<category><![CDATA[general]]></category>
		<category><![CDATA[i18n]]></category>
		<category><![CDATA[utilities]]></category>
		<category><![CDATA[web]]></category>

		<guid isPermaLink="false">http://rishida.net/blog/?p=222</guid>
		<description><![CDATA[I was asked to make available the code for my normalization functions in JavaScript and PHP.  The links are below. I&#8217;m making the code available under a Creative Commons Attribution-Noncommercial-Share Alike licence. 
Disclaimers Note that I make no claim to have produced polished, compact or well-optimised code!  The code does what I need, [...]]]></description>
			<content:encoded><![CDATA[<p>I was asked to make available the code for my normalization functions in JavaScript and PHP.  The links are below. I&#8217;m making the code available under a <a href="http://creativecommons.org/licenses/by-nc-sa/3.0/">Creative Commons Attribution-Noncommercial-Share Alike</a> licence. </p>
<p><strong>Disclaimers</strong> Note that I make no claim to have produced polished, compact or well-optimised code!  The code does what I need, and I&#8217;m happy with that.  You are welcome to suggest improvements, and I&#8217;m sure there are many that could be made.</p>
<p>As they say, this code is made available in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose.</p>
<p>The code is a little more convoluted that it ought to be, to get around the fact that JavaScript doesn&#8217;t understand supplementary characters, and PHP just doesn&#8217;t naturally understand Unicode. (How I long for PHP6.)</p>
<p><strong>Update: [[</strong>I meant to mention that there is a way of <a href="http://us.php.net/manual/en/class.normalizer.php">doing normalization in PHP</a> already.  I made this code available just because I had it.  I created it as a learning exercise. It may be useful, however, if you are unable to load the ICU and intl packages onto your server.<strong>]]</strong></p>
<p><strong>To use the code</strong>, simply call <code>nfc('your-text-string')</code> or <code>nfd('your-text-string')</code> from your code and capture the result.</p>
<p><strong>For PHP</strong> you&#8217;ll need <a href="http://rishida.net/code/showsource.php?source=normalization/n11n.php">these routines</a> and <a href="http://rishida.net/code/showsource.php?source=normalization/n11ndata.php">this data</a>.</p>
<p><strong>For JavaScript</strong> look at <a href="http://rishida.net/code/showsource.php?source=normalization/js/n11n.js">these routines</a> and <a href="http://rishida.net/code/showsource.php?source=normalization/js/n11ndata.js">this data</a>.  There is also a <a href="http://rishida.net/code/showsource.php?source=normalization/js/n11ndata-lite.js">lite version</a> of the data file that doesn&#8217;t include Han characters.  I use this sometimes for bandwidth savings (about 14K less).</p>
<p><strong>Test files</strong> I also created some test files for <a href="http://rishida.net/code/showsource.php?source=normalization/tests/n11ntestphp.php">PHP</a> and for <a href="http://rishida.net/code/showsource.php?source=normalization/tests/n11ntestjs.php">JavaScript</a>.<br />
Both of these expect to find a copy of <a href="http://www.unicode.org/Public/UNIDATA/NormalizationTest.txt">http://www.unicode.org/Public/UNIDATA/NormalizationTest.txt</a> in the local directory.  These files run 71,076 tests.</p>
<p><strong>Cautions</strong> Be careful about the editor you use for the data files.  I spent several hours fruitlessly debugging the routines, only to find that Notepad++ was displaying certain supplementary characters ok, but corrupting them on save.  I switched to Notepad and the problem evaporated. And I probably don&#8217;t need to add that editing the data files in something like DreamWeaver is a bad idea because it will probably normalize the data before saving.</p>
<p>Another point: you may see Unicode replacement characters at a couple of points in the PHP source.  These represent the first and last characters in the high surrogate range.</p>
<p><strong>Experimenting</strong> If you want to play with something that uses this you could try my <a href="http://rishida.net/scripts/pickers/tlicho/">Tłįchǫ (Dogrib) character picker</a>, or my <a href="http://rishida.net/tools/normalizer/">Normalizer</a> tool.  I will slowly fit this to all the pickers and to <a href="http://rishida.net/scripts/uniview/">UniView</a>.  I have a local version of UniView waiting in the wings that uses the PHP files via AJAX, to reduce download size.  For that you need a file that returns the result as plain text across the wire, such as <a href="http://rishida.net/code/showsource.php?source=normalization/getn11n.php">this</a>.</p>
<p>Well, I hope that that may be of use to someone, somewhere.  I hope I haven&#8217;t forgotten anything.</p>
]]></content:encoded>
			<wfw:commentRss>http://rishida.net/blog/?feed=rss2&amp;p=222</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
