I have a very specific question. The Accept-Language:
HTTP header contains a list of languages (often along with the preferred regional variants for such languages which IMHO would be overkill to pay attention to in most cases of small to medium-large websites, although in some cases the regional variations may be significant enough to make a difference), as well as relative quality factors which when present specify the degree to which a language is preferred on a scale of 0 to 1. The contents of this header are sent over HTTP from the web browser client, which typically assembles such string based on a list of languages specified by its user in the user interface (for Firefox see Menu -> Options -> Content -> Languages -> Choose...; for Chrome see Menu -> Settings -> See advanced settings... -> Languages -> Language and input settings; for Opera see Application Menu -> Settings -> Languages -> Preferred languages; for Internet Explorer see Settings -> Internet Options -> Generic -> Appearance -> Languages), to the server where they be interpreted.
From what I've seen, the value of the Accept-Language:
header sent by these browsers consists of a comma-separated string of fields each consisting of a language code optionally followed by a semicolon followed by the relative quality factor substring q=qualityFactorHere
where qualityFactorHere
is a number between zero and one. In practice all of the browsers I've tested omit the relative quality factor substring for the first field, and arbitrarily and somewhat inconsistently include relative quality factors for the other fields such that the quality factors are in decreasing order. For example, for a browser where the user has specified the list of languages en, zh-cn, zh-hk, es
, the raw HTTP header could look as follows:
Accept-Language: en,zh-cn;q=0.8,zh-hk;q=0.5,es;q=0.3
or as follows:
Accept-Language: en,zh-cn;q=0.8,zh-hk;q=0.6,es;q=0.4
For most practical purposes both strings convey the same information.
So, my question is, given that the contents of the value of this string are available to PHP coders via the $_SERVER['HTTP_ACCEPT_LANGUAGE']
PHP variable, what's the most reliable way to extract a PHP array of browser preferred languages (in a way that is independent of regional variations) in descending order of preference?
Thanks!!!
As it turns out, the task at hand is actually quite easy. Here is the solution code:
function getLangArray() {
$fields = explode(",", $_SERVER['HTTP_ACCEPT_LANGUAGE']);
for ($i = 0; $i < count($fields); $i++) {
$fields[$i] = substr($fields[$i], 0, 2);
}
return array_unique($fields);
}
var_dump(getLangArray()); // for debugging purposes only
// sample output: array(3) { [0]=> string(2) "en" [1]=> string(2) "zh" [3]=> string(2) "fr" }
Regards.