What about Similiar Images with metric distance functions? What about Similiar Images with metric distance functions?
 

News:

CPG Release 1.6.26
Correct PHP8.2 issues with user and language managers.
Additional fixes for PHP 8.2
Correct PHP8 error with SMF 2.0 bridge.
Correct IPTC supplimental category parsing.
Download and info HERE

Main Menu

What about Similiar Images with metric distance functions?

Started by da4walker, November 24, 2009, 08:30:21 AM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

da4walker

Hi folks,
there are several methods to find similar images by using color histograms. This doesn't work bad and is really interesting when you have a lot of family pictures...
It works like this:
U select a picture and want other pictures which are similar to this one. Then for the first picture some kind of signature is created and compared to the other signatures in the database (from all the other pictures).

I am not requesting someone to implement this feature. I think i could implement it myself.
So is there interest for such a feature?
If so, I would then contact some developers here for talking a bit in detail.

cu

Joachim Müller

There is a mod that creates histograms: http://forum.coppermine-gallery.net/index.php/topic,18759.0.html
The enlargeIt plugin by Timo features a histogramm button as well. What exactly do you propose? Do you propose to create a histogram on upload and store that inside the database to be able to come up with code that allows you to search the database for identical or similar images based on the histogram? That would be a cool feature, but I'm not sure about the impact on resources that may be huge.
Anyway, I'm interessted to hear more, please elaborate.

da4walker

Hi Joachim,

As you allready noticed the intension is the following one:
- on each picture upload a histogram vector is created, which is stored in the database. (e.g. v(n1,n2,n3,...)
  of course when we later want to search for the most similar pictures we have to calculate a metric distance, e.g. euclidean, manhatten, and so on.., for every vector in the database. Although this operation is linear in complexity it would take really long if you have thousands of pictures in you database.
- to speed this up I would suggest to calculate a certain value of each vector and store it in the database too, and also index it with a btree or something like that. This value could be the euclidean distance on the vector itsself: value=(n1^2+n2^2+n3^2)^(1/2).  Now there comes the great thing about this value (at least i think so): If two pictures are similar in their histograms they surely will have a value which is similar (difference of the two values is not big). Of course there are a lot of other not similar pictures where the value will be similar although the pictures are not. But this doesn't matter because we surely will reduce the resultset to just a fraction of all the pictures in the database.

We first determine the pictures which are similar considering the calculated single value. Lets say we get 30 Pictures as result out of thousands. Because the value discribed above should be a lower bound considering the similarity, it is garanted that the most similar pictures will among this 30 pictures. Now we do some more calculation on similarity which will be ok for a low number of pictures and then we have the most similar pictures.

When this certain value is stored with a tree index, which should be able in mysql, this thing could work well.

Here is a PDF from some lecture where the idea of distance functions is described a bit.

Any questions on this?

cu


Αndré


da4walker

So does anyone know which library i could use to calculate color histograms from pictures in php? I read some doku of gd library but is seems to bee just usefull for creating images...
Need some hints on that so I can start this thing in php. I will then start creating such a script and see if it also works in a good speed.

cu


da4walker

Ok i found it, GD Library has at least a function to get some pixel information:
int gdImageGetPixel(gdImagePtr im, int x, int y) (FUNCTION)
So I have to create the Histograms by myself, which should be just some writing, and not difficult.

cu

phill104

Take a look at the enlargit plugin ( http://forum.coppermine-gallery.net/index.php/topic,57424.0.html ) as that generates histograms in php. Might save you a bit of time re-inventing the wheel.
It is a mistake to think you can solve any major problems just with potatoes.

da4walker

oh yeah, should have read joachims post better, he allready mentioned this.
found the code in the enlarge mod, makes everything a bit faster i think now :)

thnx for your hint.
cu

Joachim Müller

The enlargeit plugin basically just uses the histogram code taken from the mod I refered to initially, so you don't actually have to look at the enlargit plugin (as the main purpose of that plugin lies somewhere else), but just look at the mod (Histogram added.).
By coincidence I have been working on the histogram part of the enlargeit plugin for cpg1.5.x the past three days (adding an option to the plugin to cache the histogram images properly and maintaining that cache efficiently).
Keep in mind though that GD2 (which is needed for that mod) is not available everywhere and that it's really a resource hog. As you don't need to actually create graphical resources, but are just interessted to come up with a vector and a calculated value based on that vector, the resources consumption should be neglible.
I can see what you're trying to do now, and as far as I can see it could be accomplished by creating a plugin (instead of a mod, which basically is a hack or coppermine's core code). Maybe you will need some additional plugin hooks, but that would be acceptable.
The real tricky part is to come up with the search queries in the end to refine your search.

da4walker

Hi folks,

after coding a while i have managed to create a standalone working system.
I created a jApplet to upload complete folders of pictures, which are indexed into the database.
On a html page u can select another picture to uploaded and compared. After that all similar pictures (in a certain range) are shown according to their similarity.
For some pictures it works really well, for some others it doesn't work the best. But it's really cool as a plugin which proposes similar pictures....

I attached an example result. The algorithm just uses histograms of the pictures which are stored as a vector.
This example takes under 2 second with a database which has 1000 pictures in it.
To reach this speed i made for each color, red, green, blue  small histograms with just 3 bins which are stored and index in the database. My first idea with the euclidean distance of the vector didn't work because it wasn't a lower bound and lost some good results.

I have another idea to make this algorithm much faster and will test it the next days.

If someone has some pictures (not too big each) he could upload it somewhere, so that i have more pictures to test with. I am thinking of about a few 10 000 pictures at least so that i can say something about a big database (at least for pictures).

the only thing you need on the server side so far is the gd library for calculating the histograms. But also if you don't have such a library you could calculate the vectors offline on your computer and put it into the database with a link to the picture.

After this thing is working i will go through the code and comment it a bit, but then i would need a bit your help because I have no idea of creating a plugin for coppermine.
Put this could do someone of you, i could make some functions for all these algorithm so that it can be easy used.

What do you think about this?

Αndré

The algorithm has to be accelerated a lot to avoid timeout problems on larger galleries. I have a gallery with ~75k pictures on a free webhost (funpic.de), where we can test your plugin once it's created :)