IPTC Keywords in uploaded files IPTC Keywords in uploaded files
 

News:

CPG Release 1.6.26
Correct PHP8.2 issues with user and language managers.
Additional fixes for PHP 8.2
Correct PHP8 error with SMF 2.0 bridge.
Correct IPTC supplimental category parsing.
Download and info HERE

Main Menu

IPTC Keywords in uploaded files

Started by jaus, October 07, 2013, 12:10:05 AM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

jaus

I have Coppermine configured to read IPTC data from .jpg files.   After doing some recent batch uploads I have noticed that the files are imported, IPTC data is retained, but the keywords that are brought into my cpg database are a subset of the IPTC keywords stored within the image.

How does Coppermine filter out some keywords and how can that be overcome so that all keywords are imported with the image?

This is not a field size limitation as some of my keyword series would easily fit within the 255 char field, but are subset nonetheless.

Αndré

Please post at least one example of the original and the imported keywords. Ideally, you also attach the corresponding image to your reply so we can test it ourselves.

jaus

Sorry for the late reply, been out of town.

Here are the keywords from the attached sample image :

   aquatic
   California
   California Sea otter (Enhydra lutris)
   cute
   fur
   furry
   mammal
   mammalian
   mammals
   Mammals
   marine
   Marine
   Moss Landing
   Nature
   North America
   North American
   Sea Otter
   sealife
   USA
   wildlife
   Wildlife


Here are the keywords that appear for the image when displayed in editpics.php:

aquatic;California;cute;fur;furry;mammal;mammalian;mammals;marine;North America;North American;sealife;USA;wildlife

Αndré

For some reason the picture I download doesn't contain any meta data. It's also just ~200KB instead of ~400KB. Please attach it again as a zip file.

Coppermine doesn't add keywords with a different character case, that's why "marine", "mammals" and "wildlife" is just added once. But I can see that there seems to be am issue with

  • California Sea otter (Enhydra lutris)
  • Moss Landing -- missing
  • Nature
  • Sea Otter

They all contain a space, but as other keywords with spaces are added correctly, that's probably not the reason.

jaus

It looks like this problem is actually being caused by a bizarre glitch in the software I use to copy the jpegs into the coppermine album folders.  Some keywords are stripped off in the process, not in raw files, only in jpegs.  I'll have to report it to the developer.

Sorry for the false alarm.

jaus

Update:  This also happens when copying and pasting files via Windows (ctl-c, ctl-v) so its not a fault of my cataloging software.  It only happens on jpg files, not on .cr2 files, and seems to randomly leave out about 1 in 5 keywords.

Does anyone know what might cause this?

Αndré

I don't get what's actually going on. You export your raw file to an jpg file. At this point, the jpg file contains all keywords. Now you copy the jpg file to another place and suddenly some keywords are missing? That doesn't make sense to me.

jaus

It doesn't make sense to me either, but that is what I see happening.   I will have to experiment more to see if I can identify an underlying cause.

Αndré

I assume your initial exported jpg file already doesn't contain all keywords.

jaus

Well that is difficult to say.  It appears this is not related to copying after all.  When looking at the jpeg in its original location, PS sees all of the keywords, Imatch (my cataloging software) sees all of the keywords, but Breezebrowser does not.  BB sees the same subset of words that Coppermine is importing.

If I copy the image to a new location the same scenario exists so it is not related to copying but appears to be in how different software are reading the file.

So, PS and Imatch can see all the keywords, BB and CPG cannot.   I don't get it.

phill104

#10
Which particular keywords are read? PS uses quite a large set including some custom ones based on manufacturer. In CPG we only read certain parts of the IPTC data as you can see from the code below. We should for CPG1.6 review this list. At least 1 field is depreciated.

function get_IPTC($filename) {
        $IPTC_data=array();
        $size = GetImageSize ($filename, $info);
        if (isset($info["APP13"])) {
            $iptc = iptcparse($info["APP13"]);
            if (is_array($iptc)) {
                $IPTC_data=array(        "Title"                        =>         $iptc["2#005"][0],        # Max 65 octets, non-repeatable, alphanumeric
                                        "Urgency"                =>         $iptc["2#010"][0],        # Max 1 octet, non-repeatable, numeric, 1 - High, 8 - Low
                                        "Category"                =>         $iptc["2#015"][0],        # Max 3 octets, non-repeatable, alpha
                                        "SubCategories"                =>         $iptc["2#020"],                # Max 32 octets, repeatable, alphanumeric
                                        "Keywords"                =>         $iptc["2#025"],                # Max 64 octets, repeatable, alphanumeric
                                        "Instructions"                =>         $iptc["2#040"][0],        # Max 256 octets, non-repeatable, alphanumeric
                                        "CreationDate"                =>         $iptc["2#055"][0],        # Max 8 octets, non-repeatable, numeric, YYYYMMDD
                                        "CreationTime"                =>         $iptc["2#060"][0],        # Max 11 octets, non-repeatable, numeric+-, HHMMSS(+|-)HHMM
                                        "ProgramUsed"                =>         $iptc["2#065"][0],        # Max 32 octets, non-repeatable, alphanumeric
                                        "Author"                =>         $iptc["2#080"][0],        #!Max 32 octets, repeatable, alphanumeric
                                        "Position"                =>         $iptc["2#085"][0],        #!Max 32 octets, repeatable, alphanumeric
                                        "City"                        =>         $iptc["2#090"][0],        # Max 32 octets, non-repeatable, alphanumeric
                                        "State"                        =>         $iptc["2#095"][0],        # Max 32 octets, non-repeatable, alphanumeric
                                        "Country"                =>         $iptc["2#101"][0],        # Max 64 octets, non-repeatable, alphanumeric
                                        "TransmissionReference"        =>         $iptc["2#103"][0],        # Max 32 octets, non-repeatable, alphanumeric
                                        "Headline"                =>         $iptc["2#105"][0],        # Max 256 octets, non-repeatable, alphanumeric
                                        "Credit"                =>         $iptc["2#110"][0],        # Max 32 octets, non-repeatable, alphanumeric
                                        "Source"                =>         $iptc["2#115"][0],        # Max 32 octets, non-repeatable, alphanumeric
                                        "Copyright"                =>         $iptc["2#116"][0],        # Max 128 octets, non-repeatable, alphanumeric
                                        "Caption"                =>         $iptc["2#120"][0],        # Max 2000 octets, non-repeatable, alphanumeric
                                        "CaptionWriter"                =>         $iptc["2#122"][0],       # Max 32 octets, non-repeatable, alphanumeric
                );
                $IPTC_data=strip_IPTC($IPTC_data); //sanitize data against sql/html injection; trim any nongraphical non-ASCII character:
                $IPTC_data=filter_content($IPTC_data);   //run the data against the bad word list
            }
        }
return $IPTC_data;
It is a mistake to think you can solve any major problems just with potatoes.

jaus

My first post shows the full list of keywords and the ones that were read by CPG.

However, I just did a controlled test , removing Breezebrowser from the loop, and CPG captured the full list of keywords when importing the image.   It is starting to appear that Breezebrowser may be the culprit and I may have used BB to copy the file that CPG imported initially.  It could be that I am using an older version of BB which is having trouble reading the files of newer cameras.   I'll use Imatch or PS to copy files into the CPG album folders in the future.