1.4.1 - keyword list return incorrect result for chinese characters (utf8) 1.4.1 - keyword list return incorrect result for chinese characters (utf8)
 

News:

cpg1.5.48 Security release - upgrade mandatory!
The Coppermine development team is releasing a security update for Coppermine in order to counter a recently discovered vulnerability. It is important that all users who run version cpg1.5.46 or older update to this latest version as soon as possible.
[more]

Main Menu

1.4.1 - keyword list return incorrect result for chinese characters (utf8)

Started by itang, April 26, 2005, 06:21:28 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

itang

Running 1.4.1 and utf-8 encoding. Have some picuters with Chinese characters keyword. The keyword list comes out correctly.

When clicking on certain Chinese keywords, no picture has been found. The searching result table title (at the top right hand corner) contained corrupted Chinese words.

However, there is no problem to do a normal search and all the Chinese characters shown up correctly.

Casper

I have no way to test this, moved to 1.4 testing/bugs board.

itang, thanks foryour report.
It has been a long time now since I did my little bit here, and have done no coding or any other such stuff since. I'm back to being a noob here

itang


Joachim Müller


Joachim Müller



artistsinhawaii

Finally feeling a little more comfortable around Coppermine, I thought I'd give the Japanese language a test drive. 

Sort function works fine for Japanese language in standard and keyword search modes, provided keywords are not in quotes. 

With Japanese, however, and this goes back to using quotes to separate keywords, I can't find a way to separate keywords in a keyword field so that each keyword is recognized as a separate word.  Japanese quotes and English quotes do not work. 

If I separate the words with spaces, they are linked as one keyword and listed in the keyword search as: 吉川 田辺 神奈川  ( ONE keyword with no surrounding quotes).
If I separate the words with quotes, they are linked as ONE keyword and listed in the keyword search as  "吉川" " 田辺" " 神奈川"   -  (ONE keyword all within quotes).

Without quotes, there is no problem searching. 

With quotes, the search function fails and the return result is:

search:               
検索結果 - ""吉川" "田辺" "神奈川""


Dennis
Learn and live ... In January of 2011, after a botched stent attempt, the doctors told me I needed a multiple bypass surgery or I could die.  I told them I needed new doctors.

Joachim Müller

We're getting nowhere with this thread. I'm marking it as "known issue"

artistsinhawaii

I don't know if this will help any of the developers, but, the space character for Asian languages (Chinese/Japanese) is:  & # 12288; (without the spaces between &,#,12288) as opposed to %20

Dennis
Learn and live ... In January of 2011, after a botched stent attempt, the doctors told me I needed a multiple bypass surgery or I could die.  I told them I needed new doctors.

logue

Isn't a problem in setting of php.ini?

I did the interval to a trouble which was alike when translating SMF into Japanese.
Then, garbled characters were lost when php.ini was modified as follows.

unicode.runtime_encoding = iso-8859-1
to
unicode.runtime_encoding = utf-8

;mbstring.internal_encoding = EUC-JP
to
mbstring.internal_encoding = UTF-8

;mbstring.substitute_character = none;
to
mbstring.substitute_character = 12307;

tidy.clean_output = Off
to
tidy.clean_output = On

There may also be a part which is not related...
SMFを日本語化してます。
Epilogue/LogueWiki
Forum

CapriSkye

it might have something to do with your database encoding...
i'm using latest apache, php, and mysql 5
everything works okay, nothing wrong with keyword search.
i even tried with the keyword itang was having trouble with.
jfyi

artistsinhawaii

Quote from: CapriSkye on November 09, 2005, 01:15:09 AM
it might have something to do with your database encoding...
i'm using latest apache, php, and mysql 5
everything works okay, nothing wrong with keyword search.
i even tried with the keyword itang was having trouble with.
jfyi

Interesting.  Did you try stringing multiple keywords? More than one keyword in the keyword field?

I found that replacing the doublebyte space character with %20 allows these words to be searched separately.

Dennis
Learn and live ... In January of 2011, after a botched stent attempt, the doctors told me I needed a multiple bypass surgery or I could die.  I told them I needed new doctors.

CapriSkye

here's my test site, you can try it yourself
http://www.capriskye.com/gallery

the snowboard pic is using the keywords as itang used.
the msu pic has one character in the keyword field.
and doesn't matter how many characters i used to search, i still get the correct results.

artistsinhawaii

Quote from: CapriSkye on November 09, 2005, 03:06:29 AM

and doesn't matter how many characters i used to search, i still get the correct results.

But what about characters separated by spaces?  Suppose you wanted to link a picture to three different albums?  ie

四角 丸い 細長 

Will the search function read each as a separate keyword? or will it read it all as one?  In my case, it reads it all as one.

Dennis

Learn and live ... In January of 2011, after a botched stent attempt, the doctors told me I needed a multiple bypass surgery or I could die.  I told them I needed new doctors.

CapriSkye

are you saying if you search 角, it wouldn't return anything?
even though that keyword is in 四角 丸い 細長?
if so that's not the case for me.

i've change the keywords for the snowboard to the following:
電腦 動作 雪板 雪 滑

if you just search 電, it would return that picture.
if you search 電腦動作, without the space between them, it wouldn't return anything.
but i don't think that's a problem, isn't that how it's suppose to work?

artistsinhawaii

Quote from: CapriSkye on November 10, 2005, 05:57:17 AM
if you just search 電, it would return that picture.
if you search 電腦動作, without the space between them, it wouldn't return anything.
but i don't think that's a problem, isn't that how it's suppose to work?


yep, that's how it's suppose to work.  Good to know that the problem is not in CPG but in my database encoding somewhere.  I'll sort it out.  Thanks.

Dennis
Learn and live ... In January of 2011, after a botched stent attempt, the doctors told me I needed a multiple bypass surgery or I could die.  I told them I needed new doctors.

DJMaze

if you look carefully only one character is changed into garbig.
The two questionmarks should actualy be three questionmarks since the chinese character consists of three bytes, not two.
Mostly it means that the font used in IE doesn't have the character within or the application can't handle the code.

I don't have IE and i don't understand chinese so finding that specific character is hard for me. Maybe post it ?
There are 2 kinds of users in this world: satisfied and complainers.
Why do we never hear something from the satisfied users?
http://coppermine-gallery.net/forum/index.php?topic=24315.0

artistsinhawaii

@DJMAZE,


I can't remember how I pulled this one out but for Asian character sets, the blank space between characters when one hits the space bar is:

& # 12288; (that's without the spaces between the &, #, and 12288) as opposed to %20 or & # 32;  for standard latin ascii characters.  If this could be replaced with the latin ascii equivalent that would resolve the issue.


Dennis
Learn and live ... In January of 2011, after a botched stent attempt, the doctors told me I needed a multiple bypass surgery or I could die.  I told them I needed new doctors.

DJMaze

If this realy is a "space" issue then there are probably much more issues since there are many kinds of spaces.

I will design a PHP with the complete unicode character set so that people can test each and every character ok ?
There are 2 kinds of users in this world: satisfied and complainers.
Why do we never hear something from the satisfied users?
http://coppermine-gallery.net/forum/index.php?topic=24315.0

DJMaze

nvm here it is: http://dragonflycms.org/unicode/?p=60#12288

I've wrapped it up in pages constisting of 150 chars on each page or your browser might choke in it
There are 2 kinds of users in this world: satisfied and complainers.
Why do we never hear something from the satisfied users?
http://coppermine-gallery.net/forum/index.php?topic=24315.0