[Done]: [Suggestion for code review: select_lang.inc.php] automatic language detection [Done]: [Suggestion for code review: select_lang.inc.php] automatic language detection
 

News:

cpg1.5.48 Security release - upgrade mandatory!
The Coppermine development team is releasing a security update for Coppermine in order to counter a recently discovered vulnerability. It is important that all users who run version cpg1.5.46 or older update to this latest version as soon as possible.
[more]

Main Menu

[Done]: [Suggestion for code review: select_lang.inc.php] automatic language detection

Started by ripat, July 10, 2007, 01:58:02 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

ripat

First, I would like to say that I'am impressed by the quality of Coppermine and the by the amount of work it represents.

Living in a country where 3 different languages are spoken, I paid a special attention to the automatic language detection based on the Accepted-Language and User-Agent HTTP strings.

GENERAL REMARK

MY SUGGESTION
The code below is faster and has more features. Faster by the use of PCRE regex functions that are *much* faster than the POSIX ones. In a little benchmark (100 loops) the new code is 3 times faster if there is a Accepted-Language string and up to 5 times faster on the User-Agent string.

As for the new feature, in the definition of the http Accepted-Language string w3c says:
Each language-range MAY be given an associated quality value which represents an estimate of the user's preference for the languages specified by that range. The quality value defaults to "q=1".
http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.4

My code below takes the user preferences into account by sorting the languages tokens on their weight (q=0.x)

For example: if the Accepted-Language strings looks like: ww,ww-zz,de=0.2;q=0.1,it;q=0.5,en;q=0.3, the code will disregard the non-existing ww or ww-zz tags and will pick-up the language-tag that has the higher q factor, it in this case.

function lang_detect_q($available_languages) {
    if (!empty($_SERVER['HTTP_ACCEPT_LANGUAGE'])) {
        $language_tokens = explode(',', $_SERVER['HTTP_ACCEPT_LANGUAGE']);
        // loop through each Accept-Language token and find quality level (i.e. q=0.8)
        $lang_tag = $quality_tag = array();
        foreach ($language_tokens as $language_token ) {
            // explodes on ;q
            $q_explode = explode(';q=', $language_token);
            // if no q factor in token default q value = 1
            $q = isset($q_explode[1]) ? $q_explode[1] : 1;
            // add language_tag and quality_tag to array
            $lang_tag[]    = $q_explode[0];
            $quality_tag[] = $q;
        }
        // sorts array on key in reverse order (higher quality first)
        // array_multisort was too slow
        arsort($quality_tag);
        // loop throuh every quality_tag array
        foreach ($quality_tag as $q_key => $q_val) {
            // loop through each available_languages
            foreach ($available_languages as $key => $language) {
                if (preg_match('#^(?:'. $language[0] .')#i', $lang_tag[$q_key])){
                    // exit function on first match.
                    return $available_languages[$key][1];
                }
            }
        }

    // if Accept-Language not present in the client's http header, we try the User-Agent string
    } elseif (!empty($_SERVER['HTTP_USER_AGENT'])) {     
        // once again, loop through each available_languages
        foreach ($available_languages as $key => $language) {
            if (preg_match('#[(,; [](?:'. $language[0] .')[]),;]#i', $_SERVER['HTTP_USER_AGENT'])) {
                // exit function on first match.
                return $available_languages[$key][1];
            }
        }
    }
    // if nothing found --> exit function with false (or default language value if necessary)
    return false;
}

$lang = lang_detect_q($available_languages);
// If we catched a valid language, configure it
if ($lang) {
    $USER['lang'] = $lang;
}


As for the $available_languages array, the PCRE functions run slightly faster when the grouping parenthesis (option1|option2) are rendered non capturing as in (?:option1|option2). So,
'fr' => array('fr(?:-[[:alpha:]]{2})?|french', 'french', 'fr'),

Let me know if something need to be changed.

Nibbler


ripat

Yes I did.

IE 5.5
IE 6.0
IE 7.0
FF 2.0 (Linux)
FF 2.0 (OS-X)
Opera (Linux)
Opera (Windows)
Safari 9.2 (OS-X)

And even CURL and wget :=)

They are all OK but it's normal as they all send pretty standard Accepted-Language strings. If that string is not present, like for CURL and wget, the fallback on the User-Agent string is far less efficient as they are far from standard and don't always contain the localisation tag.

What I mean is that the language detection relies on string sent by the browser in the http header. Pretty straight forward. Not like that html/css stuff when the client receives the html page and must parse it correctly!

Jean-Luc.


Nibbler


Joachim Müller

That's what I'm up to. The language manager isn't done yet. My goal is to let the admin decide if he wants language auto-selection based on browser language or not. Let's hope I get all the features done before the feature freeze stage.