[Done]: [Suggestion for code review: select_lang.inc.php] automatic language detection [Done]: [Suggestion for code review: select_lang.inc.php] automatic language detection
 

News:

CPG Release 1.6.26
Correct PHP8.2 issues with user and language managers.
Additional fixes for PHP 8.2
Correct PHP8 error with SMF 2.0 bridge.
Correct IPTC supplimental category parsing.
Download and info HERE

Main Menu

[Done]: [Suggestion for code review: select_lang.inc.php] automatic language detection

Started by ripat, July 10, 2007, 01:58:02 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

ripat

First, I would like to say that I'am impressed by the quality of Coppermine and the by the amount of work it represents.

Living in a country where 3 different languages are spoken, I paid a special attention to the automatic language detection based on the Accepted-Language and User-Agent HTTP strings.

GENERAL REMARK

MY SUGGESTION
The code below is faster and has more features. Faster by the use of PCRE regex functions that are *much* faster than the POSIX ones. In a little benchmark (100 loops) the new code is 3 times faster if there is a Accepted-Language string and up to 5 times faster on the User-Agent string.

As for the new feature, in the definition of the http Accepted-Language string w3c says:
Each language-range MAY be given an associated quality value which represents an estimate of the user's preference for the languages specified by that range. The quality value defaults to "q=1".
http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.4

My code below takes the user preferences into account by sorting the languages tokens on their weight (q=0.x)

For example: if the Accepted-Language strings looks like: ww,ww-zz,de=0.2;q=0.1,it;q=0.5,en;q=0.3, the code will disregard the non-existing ww or ww-zz tags and will pick-up the language-tag that has the higher q factor, it in this case.

function lang_detect_q($available_languages) {
    if (!empty($_SERVER['HTTP_ACCEPT_LANGUAGE'])) {
        $language_tokens = explode(',', $_SERVER['HTTP_ACCEPT_LANGUAGE']);
        // loop through each Accept-Language token and find quality level (i.e. q=0.8)
        $lang_tag = $quality_tag = array();
        foreach ($language_tokens as $language_token ) {
            // explodes on ;q
            $q_explode = explode(';q=', $language_token);
            // if no q factor in token default q value = 1
            $q = isset($q_explode[1]) ? $q_explode[1] : 1;
            // add language_tag and quality_tag to array
            $lang_tag[]    = $q_explode[0];
            $quality_tag[] = $q;
        }
        // sorts array on key in reverse order (higher quality first)
        // array_multisort was too slow
        arsort($quality_tag);
        // loop throuh every quality_tag array
        foreach ($quality_tag as $q_key => $q_val) {
            // loop through each available_languages
            foreach ($available_languages as $key => $language) {
                if (preg_match('#^(?:'. $language[0] .')#i', $lang_tag[$q_key])){
                    // exit function on first match.
                    return $available_languages[$key][1];
                }
            }
        }

    // if Accept-Language not present in the client's http header, we try the User-Agent string
    } elseif (!empty($_SERVER['HTTP_USER_AGENT'])) {     
        // once again, loop through each available_languages
        foreach ($available_languages as $key => $language) {
            if (preg_match('#[(,; [](?:'. $language[0] .')[]),;]#i', $_SERVER['HTTP_USER_AGENT'])) {
                // exit function on first match.
                return $available_languages[$key][1];
            }
        }
    }
    // if nothing found --> exit function with false (or default language value if necessary)
    return false;
}

$lang = lang_detect_q($available_languages);
// If we catched a valid language, configure it
if ($lang) {
    $USER['lang'] = $lang;
}


As for the $available_languages array, the PCRE functions run slightly faster when the grouping parenthesis (option1|option2) are rendered non capturing as in (?:option1|option2). So,
'fr' => array('fr(?:-[[:alpha:]]{2})?|french', 'french', 'fr'),

Let me know if something need to be changed.

Nibbler


ripat

Yes I did.

IE 5.5
IE 6.0
IE 7.0
FF 2.0 (Linux)
FF 2.0 (OS-X)
Opera (Linux)
Opera (Windows)
Safari 9.2 (OS-X)

And even CURL and wget :=)

They are all OK but it's normal as they all send pretty standard Accepted-Language strings. If that string is not present, like for CURL and wget, the fallback on the User-Agent string is far less efficient as they are far from standard and don't always contain the localisation tag.

What I mean is that the language detection relies on string sent by the browser in the http header. Pretty straight forward. Not like that html/css stuff when the client receives the html page and must parse it correctly!

Jean-Luc.


Nibbler


Joachim Müller

That's what I'm up to. The language manager isn't done yet. My goal is to let the admin decide if he wants language auto-selection based on browser language or not. Let's hope I get all the features done before the feature freeze stage.