Danish Characters Issues - UTF-8 Danish Characters Issues - UTF-8
 

News:

cpg1.5.48 Security release - upgrade mandatory!
The Coppermine development team is releasing a security update for Coppermine in order to counter a recently discovered vulnerability. It is important that all users who run version cpg1.5.46 or older update to this latest version as soon as possible.
[more]

Main Menu

Danish Characters Issues - UTF-8

Started by mboesen, June 20, 2011, 11:06:26 AM

Previous topic - Next topic

0 Members and 2 Guests are viewing this topic.

mboesen

Hallo

I have just installed 1.5.12 and is testing it (am still using 1.4.x, until 1.5.12 works properly for my needs) and unfortunately have run into some UTF-8, special danish character problems. The problem isn't only in 1.5, but also 1.4. It actually started when Adobe Lightroom was upgraded from 3.3 to 3.4. Now I know this is not a support forum for LR, but after several discussion on Adobe Forum I will also try here.

The problem occurs when exporting images from LR and then adding them to Coppermine. In LR 3.3 there was no problem, but now in LR 3.4 the danish characters øæå is not translated correctly... I get this

FC Nordsjælland mod Randers FC
FC Nordsjælland mod Randers FC i Farum Park. Resultat 2-1. Mål af ....

I started out thinking that is it LR that have a problem. But according to Adobe in the new version of LR, they say...

"I think what you're seeing is by design. Lightroom is now following the Metadata Working Group guidelines for IPTC character set encoding, which requires always using  UTF-8 character set encoding.  This allows files to be exchanged correctly between systems that use different local character set encodings."

I tried to explain that Coppermine was indeed using UTF-8. Then I was directed to this....

Have a look at your image using Jeffrey's Exif Viewer (a WebInterface to ExifTool) and check the content there.

And at Jeffreys my data is correct... coming from LR....

So now I really don't know where to start....

Can anyone help on this issue?

My gallery is at http://boesenfoto.dk/gallery/index.php

mboesen

Morning guys

I have now made some tests and the problem seems to be in Coppermine....

I made an images and entered øæå... and it is fine i Bridge (and other places where I can check the meta data), but not in Coppermine. The really weird thing is that I tried ÆØÅ.øæå and the only thing in Coppermine was the "."... If I added text before, then the øæå is imported but in weird forms.

I am trying to have someone explain to me what exactly is the difference from LR 3.3 and 3.4.x., since it suddenly don't work in Coppermine. Will returns once I get the explanation..

Thanks for your time

Αndré

We're talking about EXIF/IPTC data, right? Afaik a user (also from the Nordic countries) already reported a similar issue. Please read this thread and try if the suggestions will (at least partly) fix your issues.

mboesen

Morning André

I did find that thread and started replying, but I believe the system said it was an old thread and if I really wanted to reply. So I decided to start a new thread. After reading the thread I got the feeling that he still had the same problems and corrected them manually. That will be really hard for me when adding more than a 100 images from soccer matches at the same time.

I got this from the Adobe thread where I discuss if it could be a problem in Lightroom....

"In Lightroom 3.4 we updated the reading and writing of metadata to conform to the guidelines of the Metadata Working Group specification. As part of the MWG spec, some changes were made to text encoding for reading/writing various metadata fields. See http://www.metadataworkinggroup.org/p... (page 32, 33). My guess is that the Coppermine might not follow the MWG spec."

Does that help anything. I don't much about coding, so it doesn't help me much ;-)

Thanks for your time...

mboesen

Just tried to add the config line. It changed the danish characters, also the screwed up ones, again :-/ But not into the correct letters. Tried to upload new image with the mentioned characters, but still no luck...

Αndré

It's quite hard to test it with existing files. If you can, install a second test gallery on your server. Make sure the database character set is set to utf8 and try if it makes a difference to use the additional dbcharset config line or not.

Can you please attach one or more images with metadata, that makes trouble? So I can perform some own tests. Thanks.

mboesen

Hi André

Sorry hadn't seen your answer. Apparently I don't get an email notification... gotta check that out ;-)

I did add the line and uploaded new photos without luck. Will look into making a new DB but am not that hardcore in all this DB stuff so I am afraid to to mess up the whole thing...

But until then I will add a few files for you to use for testing....

Thanks alot

Michael

mboesen

Still having huge problems with this. Tried to run a PHP check in Coppermine and found this

iconv
iconv support  enabled 
iconv implementation  glibc 
iconv library version  1.11 

Directive Local Value Master Value
iconv.input_encoding ISO-8859-1 ISO-8859-1
iconv.internal_encoding ISO-8859-1 ISO-8859-1
iconv.output_encoding ISO-8859-1 ISO-8859-1

Could that be the problem? And if so is that something I have to change on my webhost server?

Michael

Αndré

Sorry I have overlooked this thread. I just did some testing. The IPTC data in your files is htmlentitiy encoded, which seems to be the cause.

The following works for me. Open include/picmgmt.inc.php, find
                $title = (isset($iptc['Headline'])) ? $iptc['Headline'] : $title;
                $caption = (isset($iptc['Caption'])) ? $iptc['Caption'] : $caption;
                $keywords = (isset($iptc['Keywords'])) ? implode($CONFIG['keyword_separator'], $iptc['Keywords']) : $keywords;

and replace with
                $title = (isset($iptc['Headline'])) ? html_entity_decode($iptc['Headline']) : $title;
                $caption = (isset($iptc['Caption'])) ? html_entity_decode($iptc['Caption']) : $caption;
                $keywords = (isset($iptc['Keywords'])) ? implode($CONFIG['keyword_separator'], html_entity_decode($iptc['Keywords'])) : $keywords;

mboesen

Fantastic... that actually worked.....

except for one little thing. Now it doesn't include keywords at all?!

But thank you so much.... many many hours of searching the internet and trying out stuff has ended ;-)

Αndré

I just had a closer look at the IPTC data handling. The data will be sanitized and the function htmlentities is used to do that. Please disregard/undo my last code change suggestion.

Instead, open include/iptc.inc.php, find
$data=htmlentities(strip_tags(trim($data,"\x7f..\xff\x0..\x1f")),ENT_QUOTES); //sanitize data against sql/html injection; trim any nongraphical non-ASCII character:
and replace with
$data=htmlspecialchars(strip_tags(trim($data,"\x7f..\xff\x0..\x1f")),ENT_QUOTES); //sanitize data against sql/html injection; trim any nongraphical non-ASCII character:


I'm currently not sure if it may cause a security vulnerability.

mboesen

With that change I get an upload error...

The first solution worked well, except for the keywords, but I can live with that ;-)

But if you figure out a better solution, I am all ears!

Αndré

Quote from: mboesen on August 17, 2011, 12:10:59 PM
With that change I get an upload error...
Please enable debug mode and post the error message.

mboesen

Just checked up on all files and removed the dbcharset (suggested in swedish thread) from config... and now it works perfekt.

Don't know if I missed something before or just uploaded a wrong version of a file....

But it works! Great.. finally it seems that I can upgrade to 1.5.x from 1.4

Thanks alot

nemo12

My solution for äöõü in estonian:

function strip_IPTC($data) {
    if (is_array($data)) {
        foreach ($data as $key=>$item) {
             $data[$key]=strip_IPTC($item);
        }
    } else {
      $data=mb_convert_encoding($data, 'UTF-8', 'ISO-8859-1');

added last line.