UPDATE: This post has now been updated, reviewed and released as part of a W3C article. See http://www.w3.org/International/questions/qa-personal-names.
Here are some more thoughts on dealing with multi-cultural names in web forms, databases, or ontologies. See the previous post.
The first thing that English speakers must remember about other people’s names is that a large majority of them don’t use the Latin alphabet, and a majority of those that do use accents and characters that don’t occur in English. It seems obvious, once I’ve said it, but it has some important consequences for designers that are often overlooked.
If you are designing an English form you need to decide whether you are expecting people to enter names in their own script or in an ASCII-only transcription. What people will type into the form will often depend on whether the form and its page is in their language or not. If the page is in their language, don’t be surprised to get back non-Latin or accented Latin characters.
If you hope to get ASCII-only, you need to tell the user.
The decision about which is most appropriate will depend to some extent on what you are collecting people’s names for, and how you intend to use them.
- Are you collecting the person’s name just to have an identifier in your system? If so, it may not matter whether the name is stored in ASCII-only or native script.
- Or do you plan to call them by name on a welcome page or in correspondence? If you will correspond using their name on pages written in their language, it would seem sensible to have the name in the native script.
- Is it important for people in your organization who handle queries to be able to recognise and use the person’s name? If so, you may want to ask for a transcription.
- Will their name be displayed or searchable (for example Flickr optionally shows people’s names as well as their user name on their profile page)? If so, you may want to store the name in both ASCII and native script, in which case you probably need to ask the user to submit their name in both native script and ASCII-only form, using separate fields.
Note that if you intend to parse a name, you may need to use country or language-specific algorithms to do so correctly (see the previous blog on personal names).
If you do accept non-ASCII names, you should use UTF-8 encoding in your pages, your back end databases and in all the scripts in between. This will significantly simplify your life.