1.4.1 - keyword list return incorrect result for chinese characters (utf8) - Page 2 1.4.1 - keyword list return incorrect result for chinese characters (utf8) - Page 2
 

News:

CPG Release 1.6.26
Correct PHP8.2 issues with user and language managers.
Additional fixes for PHP 8.2
Correct PHP8 error with SMF 2.0 bridge.
Correct IPTC supplimental category parsing.
Download and info HERE

Main Menu

1.4.1 - keyword list return incorrect result for chinese characters (utf8)

Started by itang, April 26, 2005, 06:21:28 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

artistsinhawaii

Quote from: DJMaze on November 25, 2005, 05:48:42 PM
nvm here it is: http://dragonflycms.org/unicode/?p=60#12288

I've wrapped it up in pages constisting of 150 chars on each page or your browser might choke in it

DJMAZE,

The use of Japanese or Chinese characters in other places works as expected.  The real issue, as I see it, is that when someone using CPG in Japanese or Chinese tries to input Kanji as their keywords and hits the spacebar between each set of Kanji to separate each "keyword", the keyword manager will save this as:

keyword1&#12288 ;keyword2&#12288 ;keyword3  (without the space between 8 and  ; )

rather than:

keyword1 keyword2 keyword3   (where the spaces are the &#32 ; (latin) variety.)

therefore the search engine would read:
the first example as ONE word not three separate words and list it as ONE word at the bottom of the search page.  The keyword manager would read the entry as ONE keyword as well and therefore the linking feature would fail to work.  If I literally replace the &#12288 with ' ' (latin space), there is no problem.  We need to replace the &#12288 character with ' ' (latin equivalent) JUST FOR THE KEYWORD field before it is saved to the database.

(Hope you didn't read this when I had %20 in there...   cuz %20 wouldn't work either.)

Dennis
Learn and live ... In January of 2011, after a botched stent attempt, the doctors told me I needed a multiple bypass surgery or I could die.  I told them I needed new doctors.

DJMaze

I understood was thinking ahead regarding other issues but posted in this topic instead of the correct forum.

Anyway as seen on unicode page use the hex code and the following code change might work (someone needs to test)

thumbnails.php around line 93:

$USER['search'] = $_POST;
$album = 'search';
}

Replace with:

$USER['search'] = $_POST;
$USER['search']['search'] = preg_replace('#[\xE3][\x80][\x80]#', ' ', $USER['search']['search']);
$album = 'search';
}

There are 2 kinds of users in this world: satisfied and complainers.
Why do we never hear something from the satisfied users?
http://coppermine-gallery.net/forum/index.php?topic=24315.0

artistsinhawaii

@DJMAZE

Sorry, didn't do anything really.
it allowed for searching of independent characters separated by the space, but it did not resolve the issue of all the characters being strung together as ONE keyword.

In other words, if I type in three different  keywords in Japanese/Chinese separated by spaces and save the settings, all three end up as ONE keyword because the asian space character is not recognized by CPG as a separator of keywords, thereby defeating the linking feature of different keywords.   

So the character replacement has to occur even before it is saved to the database OR better yet,  the asian space character must also be recognized by CPG as a legitimate space character.

Dennis

Learn and live ... In January of 2011, after a botched stent attempt, the doctors told me I needed a multiple bypass surgery or I could die.  I told them I needed new doctors.

DJMaze

The above mentioned code is only for fixing the search itself, not the keywords.
Since it works here's the fix for the keywords:

editOnePic.php around line 41 replace:

    $title        = $_POST['title'];
    $caption      = $_POST['caption'];
    $keywords     = $_POST['keywords'];
    $user1        = $_POST['user1'];
    $user2        = $_POST['user2'];

with

    $title        = $_POST['title'];
    $caption      = $_POST['caption'];
    $keywords     = preg_replace('#[\xE3][\x80][\x80]#', ' ', $_POST['keywords']);
    $user1        = $_POST['user1'];
    $user2        = $_POST['user2'];
There are 2 kinds of users in this world: satisfied and complainers.
Why do we never hear something from the satisfied users?
http://coppermine-gallery.net/forum/index.php?topic=24315.0

artistsinhawaii

@DJMAZE

Pefect!   I tried it as a keyword link, keyword for search, and then I tried searching for a particular Japanese/Chinese character and phrase when the Kanji was in the caption and title.  All work.

Dennis
Learn and live ... In January of 2011, after a botched stent attempt, the doctors told me I needed a multiple bypass surgery or I could die.  I told them I needed new doctors.

DJMaze

Bugfix added to cvs 'devel' module.
Needs testing and if it works i will commit to 'stable'

/devel/editOnePic.php new revision: 1.43
/devel/thumbnails.php new revision: 1.31
/devel/include/functions.inc.php new revision: 1.209

There are 2 kinds of users in this world: satisfied and complainers.
Why do we never hear something from the satisfied users?
http://coppermine-gallery.net/forum/index.php?topic=24315.0