The IE7 blog just announced Microsoft’s intention to change the way browser preferences for Accept-language are set up by default. Basically your preferences will no longer, by default, be set to fr if you’re French, but to fr-FR instead, ie. your locale as determined by Windows settings.
I think this is going to cause major problems with content negotiation on the Web.
To give a practical example:
Set your language settings to just es-MX and/or es-ES and point your browser to this article on the W3C site (an article explaining how to set language preferences).
You’ll get back the English version, even though there’s a Spanish version there. Someone with es set in IE6, Opera or Firefox will see the Spanish version automatically – even if their preferences are es-MX then es.
This is down to the way language negotiation is done on the Apache server.
In the article linked to above we explain that “Some of the server-side language selection mechanisms require an exact match to the Accept-Language header. If a document on the server is tagged as fr (French) then a request for a document matching fr-CH (Swiss French) will fail. To ensure success you should configure your browser to request both fr-CH and fr.”
This is from the Apache 2 documentation:
The server will also attempt to match language-subsets when no other match can be found. For example, if a client requests documents with the language en-GB for British English, the server is not normally allowed by the HTTP/1.1 standard to match that against a document that is marked as simply en. (Note that it is almost surely a configuration error to include en-GB and not en in the Accept-Language header, since it is very unlikely that a reader understands British English, but doesn’t understand English in general. Unfortunately, many current clients have default configurations that resemble this.)
Apache 2 introduces “some exceptions … to the negotiation algorithm to allow graceful fallback when language negotiation fails to find a match”, but those using Apache 1 don’t have that luxury.
Apart from the fact that most users wouldn’t even know that they can set their browser preferences differently, not to mention knowing how to do that, IE7 CR1 doesn’t even provide a preset selection for es rather than es-ES – you have to enter it manually. Not likely to happen much.
It seems to me that a simple fix to this would be for IE7 to set the user’s default preferences to *also* include es (ie. es-ES, es for Spain, fr-FR, fr for France, etc.). Then, when a file such as qa-lang-priorities.fr-fr.html is not found, the server will find qa-lang-priorities.fr.html and return a French file. Those people who want to know where the user’s browser is (likely to be) physically located can still use the fr-FR information to get the locale.
I think that the result of ignoring this is that many people will be confused about why they no longer see a page in Spanish, when they did before, and a lot of hard work by content developers will go unnoticed on the Web. In short, think Microsoft is about to introduce a serious bug into IE7.
Note, in passing, that the rules for specifying the lang attribute in HTML and xml:lang in XHTML are described by BCP47. The latest syntax and matching specifications are RFC4646 and RFC4647 – which obsolete RFC 3066 and RFC 1766, and which tells you to go to the IANA Language Subtag Registry at http://www.iana.org/assignments/language-subtag-registry to find out what language codes to use, rather than the ISO code lists. For more information, see http://www.w3.org/International/articles/language-tags/ )
Btw, I tried posting this as a comment on the IE7 blog page, but it didn’t work (site busy) so I did it here.
« How useful is :lang ? – Unicode Code Converter v4 »

October 18th, 2006 at 10:44 pm
I agree with Michael Kaplan that this is a bug in Apache, not in IE7.
Successful open source packages are full of bugs like this – cases where standards simply aren’t respected or appreciated – where commercial software is simply expected to adapt. That’s not really a fair expectation. If commercial software is expected to conform to applicable standards, I don’t see why open source software should not be expected to respect the same standards as well.
October 19th, 2006 at 1:32 am
Why IE7 will do that?
What are the reasons that push it?
Is it because they had some customers complaining that they received an american english version when they were british. So they decided to be stricter on the headers?
In this case would it be better to propose a mechanism in browsers which would display to the user.
“Hey you have received the en-US version of this document. There are other versions: fr-FR, fr-CA, en-GB. Which one would you like?”
October 19th, 2006 at 7:06 am
Dennis, I don’t disagree with you. I’m just trying to point out that once people start using IE7 as it currently stands, a lot of sites will fail to work like they did.
Microsoft has worked hard in other areas to ensure that people’s site don’t break, and I’m trying to make sure that they are aware of the potential issue here.
October 19th, 2006 at 7:22 am
Karl, if I understood correctly, I think the idea is that it would be easier to get a fix on the geographical location of the browser user than before. This can be useful, for example, for automatically displaying a weather report in part of a page, or expressing amounts in the right currency, or routing to the right country-specific page. The Accept-Language information becomes a locale tag rather than just a language tag.
October 19th, 2006 at 1:17 pm
If the idea is to get a geographical fix, then what if I’m a Frenchman who emigrated to the US ten years ago? I write fr-US? That’s not rfc-complient.
October 19th, 2006 at 1:42 pm
Hi fantasai. fr-US is perfectly fine according to RFC 4646. Note also that you can always change your settings if you need to – and you’re not restricted to a set list. And in fact, when you install IE7 it asks you to set this up – which is really quite good.
Trouble is, by my experience, people don’t really have a good idea how to do this (as you showed
) – even if they are aware that it is even possible or necessary after installation. That’s why I’m concerned about the missing fr in the default setup.
October 20th, 2006 at 7:38 am
So, first of all Apache’s behaviour here is correct, is described in full in the standard as I’ve explained on Michael Kaplan’s blog when he took a similar line to Denis.
Internet Explorer should help the user to set up a meaningful list (like say, a preference for Australian English, followed by British English, then any other English variant) and it doesn’t seem to do a decent job of that despite a /clear warning/ in the HTTP standard that this is needed.
Secondly the geographical location stuff described above is mistaken. A Frenchman in the US almost certainly wants either fr-FR (French) or en-US (American English) or perhaps both in that order, and if he pops over into Canada he doesn’t suddenly want fr-CA (Canadian French). All Accept-Language does is tell you as precisely as possible what language the user prefers to read documents in. It should NOT be used to try to determine where the user is currently, or where they live.
October 26th, 2006 at 4:12 pm
its the same as the IE6 default for English: en-US.
While en-US,en is clearly a better client setting, I suspect that in practice, an en-US request will result in en rourses being served up.
October 31st, 2006 at 4:19 am
fr-US means I speak the American variant of French. Perhaps there is such a thing, perhaps there isn’t, but at any rate I want the French French version of the page, not the American French version. That I happen to be in the US is irrelevant to what language I prefer to read in.
Let’s use perhaps a clearer example. Suppose I am American. I emigrate to New Zealand. US English remaines my first language and my preferred dialect. But I don’t want to read the New Zealand English version of a web page if there’s an American English version. My language preference therefore should be en-US, not en-NZ.
If I’m in New Zealand, it would be *very nice* if websites got a geographic fix for New Zealand, since network connections are closer from California and Japan than from New York or London. But the language preference isn’t the place to get that fix.
October 31st, 2006 at 9:52 am
Dave, actually it doesn’t, unfortunately. That’s why I wrote the blog post. If you have, say, filename.en.html and filename.fr.html on your Apache 1.3 server and set up a fallback default of filename.html (in, say, French), as recommended by the Apache documentation, you will get the French document back.
November 2nd, 2006 at 1:36 pm
After some discussion last night, it appears that Microsoft’s IIS web server may also show the problems created by this change from Microsoft.
The real problem is as a result of the definition of HTTP 1.1, not that of any web server manufacturer.
The HTTP RFC 2616 Section 15.1.4 also indicates that the practice of declaring precise accept-language may introduce security issues too.
http://www.w3.org/Protocols/rfc2616/rfc2616-sec15.html#sec15.1.4
I strongly support Richard’s suggestion that putting just the language, with a lower q score, as well as the language and region would solve the issues.
November 5th, 2006 at 1:18 am
I’m not an IE user, not even Windows user, but I think this is the way to go.
Sending a full locale string (instead of only language) may tell a web application much more about the user, for example about its currency, its way to read date and time, etc. So a es_US locale may return a spanish page with dollars instead of euros or pesos.
Yes, things will break for some time, but Apache will fix/improve the way it manages languages very easily.
November 7th, 2006 at 10:52 am
[...] Sob este título Richard Ishida, líder das Atividades de Internacionalização do W3C, publicou em seu Blog uma interessante matéria que aqui vai traduzida para conhecimento e comentários dos meus leitores. [...]