Removing duplicates from gallery. Removing duplicates from gallery.
 

News:

CPG Release 1.6.26
Correct PHP8.2 issues with user and language managers.
Additional fixes for PHP 8.2
Correct PHP8 error with SMF 2.0 bridge.
Correct IPTC supplimental category parsing.
Download and info HERE

Main Menu

Removing duplicates from gallery.

Started by remdex, June 15, 2008, 09:00:13 AM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

remdex

Hi,

Wanned share simple script that removes duplicate images from gallery. Script do not depend on gallery itself and should be run though console.
For increasing script process you should alter table  "ALTER TABLE `cpg<copermine version>_pictures` ADD INDEX ( `filesize` )". But it's not neccasary but if done, will increase script speed.
You should change these setting depending on your gallery configuration.

$HomePath = './albums/';
$TableName = 'cpg1410_pictures';
$DatabaseName = 'copermine';
$DBHost = 'localhost';
$DBUser = '<enter username your here>';
$DBPaswd = '<enter password>';


Also default limit is set to 150000 records.
LIMIT 0,150000
For testing purpose you can change it to 1.

Script should be run like "php -f removeduplicates.php" where removeduplicates.php contains script body from bellow.
Script body.

<?php
/**
 * Script for removing dublicates from database
 * */


/**
 * User settings for settings
 * */
$HomePath './albums/';
$TableName 'cpg1410_pictures';
$DatabaseName 'copermine';
$DBHost 'localhost';
$DBUser '<enter username your here>';
$DBPaswd '<enter password>';
/**END USER SETTINGs. Do not change bellow**/


$connect mysql_connect($DBHost,$DBUser,$DBPaswd) or die("error");
mysql_select_db($DatabaseName);
echo 
"Connected to database\n";

$sql "SELECT pid, filesize, count( * ) AS n FROM $TableName GROUP BY filesize HAVING n >1 LIMIT 0,150000";
$result mysql_query($sql);

while (
$row mysql_fetch_assoc($result))
{

$SQL "SELECT pid,ctime,aid,filepath,filename FROM $TableName WHERE filesize = {$row['filesize']} ORDER BY ctime";
$resultDuplicates mysql_query($SQL);

$Original = array();
while ($rowDuplicate mysql_fetch_assoc($resultDuplicates))
{

if (count($Original) == 0) {$Original $rowDuplicate;continue;}

if (!file_exists($HomePath.$Original['filepath'].$Original['filename'])) //Original does not exits. Critical ERROR
{
echo "ERROR original does not exist - \n";
echo "Original \n -";
print_r($Original);
echo "Dublicate \n -";
print_r($rowDuplicate);
exit;
}
elseif(!file_exists($HomePath.$rowDuplicate['filepath'].$rowDuplicate['filename'])) //Dublicate does not exist
{
echo "-----------------------------------------------------\n";
if (is_dir(dirname($HomePath.$rowDuplicate['filepath'].$rowDuplicate['filename']))) //Dupblicate directory exists
{
echo "Deleting duplicate DB Record \n";
$SQL "DELETE FROM $TableName WHERE pid = {$rowDuplicate['pid']}";
mysql_query($SQL);

}else 
{
echo "ERROR Duplicate does not exist - \n";
echo "Original \n -";
print_r($Original);
echo "Duplicate \n -";
print_r($rowDuplicate);
exit;
}

}
elseif ($Original['filepath'].$Original['filename'] == $rowDuplicate['filepath'].$rowDuplicate['filename'])
{
echo "-----------------------------------------------------\n";
echo "Dublikate points same file. Deleting only DB record \n";
$SQL "DELETE FROM $TableName WHERE pid = {$rowDuplicate['pid']}";
mysql_query($SQL);
}
elseif ($Original['filepath'].$Original['filename'] != $rowDuplicate['filepath'].$rowDuplicate['filename'] && sha1_file($HomePath.$Original['filepath'].$Original['filename']) == sha1_file($HomePath.$rowDuplicate['filepath'].$rowDuplicate['filename']) )
{

echo "-----------------------------------------------------\n";
$SQL "DELETE FROM $TableName WHERE pid = {$rowDuplicate['pid']}";
mysql_query($SQL);

echo "Deleting original filename - ".$rowDuplicate['pid'].' '.$rowDuplicate['filepath'].$rowDuplicate['filename']."\n";
echo "Original file - ".$HomePath.$Original['filepath'].$Original['filename']."\n";

$OriginalFilename $HomePath.$rowDuplicate['filepath'].$rowDuplicate['filename'];

if (unlink($OriginalFilename))
{
echo "OK\n";
}
else 
{
echo "FAILD\n";
}


$NormalThumbnail $HomePath.$rowDuplicate['filepath'].'normal_'.$rowDuplicate['filename'];
if (file_exists($NormalThumbnail))
{

echo "Normal thumbnail found. Proceeding to delete \n";
if (unlink($NormalThumbnail)) 
echo "OK\n";
else 
echo "FAIL\n";

}else 
{
echo "Normal thumbnail not found skipping \n";
}

$smallThumbnail $HomePath.$rowDuplicate['filepath'].'thumb_'.$rowDuplicate['filename'];
if (file_exists($smallThumbnail))
{
echo "Small thumbnail found. Proceeding to delete \n";
if (unlink($smallThumbnail)) 
echo "OK\n";
else 
echo "FAIL\n";
}else 
{
echo "Small thumbnail not found skipping \n";
}



}else //Sizes matches but sha1 sum does not match. So images are different
{
echo "Skipping -> ".$rowDuplicate['pid']."\n";
}

}

}

?>



I hope's that for someone it will be useful. I run script containing 150 000 records. It run fine and deleted over 25 000 duplicates.
Important - you should make database and gallery backup in case something goes wrong.

net

I have a question regarding how this script knows its a dupe? Does it check the size or just filename? I mean if the filename is the same its not fully possible to be a dupe.

Hein Traag

//Sizes matches but sha1 sum does not match. So images are different

That does the trick of comparing the two pictures which seems to be duplicates.