IPTC keywords with special characters from Mac to Coppermine IPTC keywords with special characters from Mac to Coppermine
 

News:

cpg1.5.48 Security release - upgrade mandatory!
The Coppermine development team is releasing a security update for Coppermine in order to counter a recently discovered vulnerability. It is important that all users who run version cpg1.5.46 or older update to this latest version as soon as possible.
[more]

Main Menu

IPTC keywords with special characters from Mac to Coppermine

Started by zeppo, August 11, 2007, 09:32:21 PM

Previous topic - Next topic

0 Members and 3 Guests are viewing this topic.

zeppo

I have a Finnish gallery, and I have written title and keywords straight in jpg files. (IPTC metadata). The problem has been special characters (umlauts) we use in Finland. Photoshop with Mac writes IPTC data with MacRoman characters, so they have to converted to UTF-8 before writing to database.

I found a script that changed MacRoman extended characters to UTF-8  characters.
It worked OK  with older versions. Now when I tried to upgrade 1.4.12 from 1.3.x, I found the script do not work properly.

If the umlaut character is in the beginning or in the end of the word, it will disappear.

I am not any coding expert, so if someone could find a solution, it would be nice. I also think the solution could be useful to some other users too :)

I have changed file /include/picmgmt.inc.php from line 43.

Here is an original version:

        if ($CONFIG['read_iptc_data']) {

           $iptc = get_IPTC($image);

           if (is_array($iptc) && !$title && !$caption && !$keywords) {  //if any of those 3 are filled out we don't want to override them, they may be blank on purpose.

               $title = (isset($iptc['Title'])) ? $iptc['Title'] : $title;

               $caption = (isset($iptc['Caption'])) ? $iptc['Caption'] : $caption;

               $keywords = (isset($iptc['Keywords'])) ? implode(' ',$iptc['Keywords']) : $keywords;

           }


And here is the modified version:

        if ($CONFIG['read_iptc_data']) {

           $iptc = get_IPTC($image);

           if (is_array($iptc) && !$title && !$caption && !$keywords) {  //if any of those 3 are filled out we don't want to override them, they may be blank on purpose.

               $title = (isset($iptc['Title'])) ? $iptc['Title'] : $title;
            $title=ereg_replace(128, "Ä",$title);
$title=ereg_replace(138, "ä",$title);
$title=ereg_replace(133, "Ö",$title);
$title=ereg_replace(154, "ö",$title);
$title=ereg_replace(134, "Ü",$title);
$title=ereg_replace(159, "ü",$title);
$title=ereg_replace(205, "Õ",$title);
$title=ereg_replace(155, "õ",$title);
$title=ereg_replace(129, "Å",$title);
$title=ereg_replace(140, "å",$title);
$title=ereg_replace(175, "Ø",$title);
$title=ereg_replace(191, "ø",$title);
$title=ereg_replace(190, "æ",$title);
$title=ereg_replace(174, "Æ",$title);
$title=ereg_replace(169, "©",$title);



               $caption = (isset($iptc['Caption'])) ? $iptc['Caption'] : $caption;
            $caption=ereg_replace(128, "Ä",$caption);
$caption=ereg_replace(138, "ä",$caption);
$caption=ereg_replace(133, "Ö",$caption);
$caption=ereg_replace(154, "ö",$caption);
$caption=ereg_replace(134, "Ü",$caption);
$caption=ereg_replace(159, "ü",$caption);
$caption=ereg_replace(205, "Õ",$caption);
$caption=ereg_replace(155, "õ",$caption);
$caption=ereg_replace(129, "Å",$caption);
$caption=ereg_replace(140, "å",$caption);
$caption=ereg_replace(175, "Ø",$caption);
$caption=ereg_replace(191, "ø",$caption);
$caption=ereg_replace(190, "æ",$caption);
$caption=ereg_replace(174, "Æ",$caption);
$caption=ereg_replace(169, "©",$caption);



               $keywords = (isset($iptc['Keywords'])) ? implode(' ',$iptc['Keywords']) : $keywords;
            $keywords=ereg_replace(128, "Ä",$keywords);
$keywords=ereg_replace(138, "ä",$keywords);
$keywords=ereg_replace(133, "Ö",$keywords);
$keywords=ereg_replace(154, "ö",$keywords);
$keywords=ereg_replace(134, "Ü",$keywords);
$keywords=ereg_replace(159, "ü",$keywords);
$keywords=ereg_replace(205, "Õ",$keywords);
$keywords=ereg_replace(155, "õ",$keywords);
$keywords=ereg_replace(129, "Å",$keywords);
$keywords=ereg_replace(140, "å",$keywords);
$keywords=ereg_replace(175, "Ø",$keywords);
$keywords=ereg_replace(191, "ø",$keywords);
$keywords=ereg_replace(190, "æ",$keywords);
$keywords=ereg_replace(174, "Æ",$keywords);
$keywords=ereg_replace(169, "©",$keywords);


           }



Joachim Müller


zeppo

I just have to add that the solution is not perfect.

For some reason if the special character (umlaut) is in the beginning or in the end of the word, it will totally disappear. So there is a bug. I just have no skills to find out where.  ???

I have added only those umlauts we use in Finland, if you like to use other ones, you must just add them there.

You can find the numeric character references for different kind of encodings for example Alan Woods site: http://www.alanwood.net/demos/macroman.html

Joachim Müller

You might want to take a look at one of the files that come with coppermine - there are functions within it that are suppossed to offer replace-functions that are unicode-save. Review include/mb.inc.php

zeppo

Thanks GauGau!

I will take a look. If I find a fully working solution, I will report it also here.

zeppo

I do not have skills do go any further with this. Just in case someone gets interested to solve this, here are things I have found so far.

1. The script worked OK in older version of Coppermine (1.2?), this version which was not able to read IPTC information, and I used  a  IPTC -> Title & Keywords mod from Bill_S to read the IPTC metadata.

2. With Coppermine 1.3.5 the script works almost OK. Keywords can be seen there, and special characters are correctly written. To make the keywords searchable, I have to click "edit description" and then "apply modifications". After that also special character keywords will became searchable.

3. With Coppermine 1.4.12 you have to do this "apply modifications" trick to make keywords searchable. And if speacial characters are in the beginning or in the end of the word they will disappear at all. (In most cases, but sometimes, when special character is in the end of the word, it can be there. I have not invented any logical reason why this happens though) :]


Joachim Müller

Quote from: zeppo on August 13, 2007, 11:10:12 AM
1. The script worked OK in older version of Coppermine (1.2?), this version which was not able to read IPTC information, and I used  a  IPTC -> Title & Keywords mod from Bill_S to read the IPTC metadata.

2. With Coppermine 1.3.5 the script works almost OK. Keywords can be seen there, and special characters are correctly written. To make the keywords searchable, I have to click "edit description" and then "apply modifications". After that also special character keywords will became searchable.
Coppermine versions prior to cpg1.4.0 didn't use utf-8 as default encoding, but used proprietary eoncoding (iso8859-1 mostly), which used to work reasonably well with the IPTC features, which are iso8859-1 only. I'm no IPTC expert and have no idea if it supports utf-8 in the first place. While cpg1.4.x is utf-8 (unicode) encoded and IPTC is iso8859-1 encoded, you'll have the issues you described (and to some extent were able to fix for a limited number of special chars). However, the unicode universe consists of thousands of non-latin characters, so the method you chose (performing a ressource-intensive replace) is not an option for all of them. In an ideal world, we could make the IPTC data unicode based and offer an converter that does the conversion from iso8859-1 to utf-8 once. But I'm afraid that only very few users (professional and semi-professional photographers mostly) take advantage of IPTC. Again, most of them use english that doesn't contain special characters, so the discrepancy between encodings won't matter. For the (comparatively small) number of users who actually use IPTC together with another language than English, it would be nice to come up with a solution. However, non of the current coppermine devs is an expert on such issues, and there are features that are being considered more important currently.
Bottom line: we'd love to see someone come up with a solution that could go into Coppermine's core. Meanwhile, you'll have to live with workarounds I'm afraid. Sorry for that.
If anybody can come up with a working solution, we'll gladly look into it and add it to coppermine's core for future versions.

zeppo

Thanks for thorough answer GauGau.

This really is not a Coppermine problem.

IPTC comes from USA and they do not have these special characters many of European languages use to have.
Photoshop Windows  uses iso8859-1 with IPTC data. Photoshop Macintosh uses MacRoman encoding, so I had to convert special characters from MacRoman to iso8859-1 earlier version of Coppermine. Later I changed to utf-8 based system.

It is clear that Coppermine developers do not need to solve this. It is my headache really. :)
I try to find some way solve this, and report here if there is a solution.