HOWTO: Optimizing cpg's frontpage for galleries with many pics and albums HOWTO: Optimizing cpg's frontpage for galleries with many pics and albums
 

News:

CPG Release 1.6.26
Correct PHP8.2 issues with user and language managers.
Additional fixes for PHP 8.2
Correct PHP8 error with SMF 2.0 bridge.
Correct IPTC supplimental category parsing.
Download and info HERE

Main Menu

HOWTO: Optimizing cpg's frontpage for galleries with many pics and albums

Started by GGallery, June 14, 2004, 03:25:59 AM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

GGallery

I did some first modifications on cpg 1.3.1 to deliver quick frontpage results on lastup/random/lastcom.

Our frontpage for our gallery got pretty slow starting with 5k pics with 2k user albums, I then simplified the fp to just 'lastup' but starting today (13k pics) it started to take 2-3 seconds to load the frontpage.

Note: the user albums themselves scale very nicely and I don;t see any reason why cpg shouldnt be able to handle a couple millions pics ... it's only the frontpage that gets slower and slower the more pics (and albums) are in the database - but here's an example of a simple hack (anything else than great but works) to get that very fast as well:

I first added 2 new fields to the pic and comment table:
alter table cpg11d_pictures add is_recent enum ('YES', 'NO') not null;
alter table cpg11d_pictures add KEY (is_recent);
alter table cpg11d_comments add is_recent enum ('YES', 'NO') not null;
alter table cpg11d_comments add KEY (is_recent);

and: insert into cpg11d_config (name, value) values ('is_recent_interval', unix_timestamp() - 100000);

at the very end (row:746 in index.php)  there is a:
   // Speed-up the random image query by 'keying' the image table
on which I added some code:

  if (time() - $CONFIG['is_recent_interval'] > 3600*4) {
       // We set is_recent_interval to a higher interval right away so that other clients don't execute the following updates as well
       $result = db_query("UPDATE {$CONFIG['TABLE_CONFIG']} SET value = '" . time() . "' WHERE name = 'is_recent_interval'");
       $result = db_query("update {$CONFIG['TABLE_PICTURES']} SET is_recent='NO' WHERE (unix_timestamp()-ctime)>7200");
       $result = db_query("update {$CONFIG['TABLE_COMMENTS']} SET is_recent='NO' WHERE (msg_date < DATE_SUB(NOW(), INTERVAL 4 hour))");
   }


==> pics older than 2 hours and comments older than 4 hours are getting marked as "is_recent=NO' - these two update commands are getting executed every 4 hours (and because I set earlier is_recent_interval in the config to unix_timestamp() - 100000 it's getting executed the first time you refresh the FP right away).

Because my gallery has a pretty simple setup (n users with m albums) and no other categories I can simplify some things in get_pics_data in include/functions.inc.php:


       switch($album){
       case 'lastcom': // Last comments
...
// GJ_START
               if ($ALBUM_SET || $keyword) {
                       $TMP_SET = str_replace($CONFIG['TABLE_PICTURES'],'p',$ALBUM_SET);
                       $result = db_query("SELECT $select_columns FROM {$CONFIG['TABLE_COMMENTS']} as c, {$CONFIG['TABLE_PICTURES']} as p WHERE approved = 'YES' AND c.pid = p.pid $keyword $TMP_SET ORDER by msg_id DESC $limit");
               }
               else { // frontpage quick hack version
                       $result = db_query("SELECT $select_columns FROM {$CONFIG['TABLE_COMMENTS']} as c, {$CONFIG['TABLE_PICTURES']} as p WHERE c.pid = p.pid AND c.is_recent='YES' ORDER by msg_id DESC $limit");
               }
// GJ_END
...
...

       case 'lastup': // Last uploads
               if ($ALBUM_SET && $CURRENT_CAT_NAME) {
                       $album_name = $lang_meta_album_names['lastup'].' - '. $CURRENT_CAT_NAME;
               } else {
                       $album_name = $lang_meta_album_names['lastup'];
               }
// GJ_START
               if ($ALBUM_SET) {
                       $result = db_query("SELECT COUNT(*) from {$CONFIG['TABLE_PICTURES']} WHERE approved = 'YES' $ALBUM_SET");
               }
               else { // frontpage 'quick hack' - no need to check 'approved' - simple count on table is much faster
                       $result = db_query("SELECT COUNT(*) from {$CONFIG['TABLE_PICTURES']}");
               }
// GJ_END
               $nbEnr = mysql_fetch_array($result);
               $count = $nbEnr[0];
               mysql_free_result($result);

               if($select_columns != '*' ) $select_columns .= ',title, caption, owner_id, owner_name, aid';

// GJ_START
               if ($ALBUM_SET) {
                       $result = db_query("SELECT $select_columns FROM {$CONFIG['TABLE_PICTURES']} WHERE approved = 'YES' $ALBUM_SET ORDER BY pid DESC $limit");
               }
               else { // frontpage 'quick hack'
                       $result = db_query("SELECT $select_columns FROM {$CONFIG['TABLE_PICTURES']} WHERE is_recent='YES' ORDER BY pid DESC $limit");
               }
// GJ_END



==>  I am using is_recent as an index with a much smaller potential result set

also in index.php I am completely dropping get_subcat:
// GJ_START
if (!$cat) {
   get_subcat_data($cat, $cat_data, $album_set_array, $CONFIG['subcat_level']);
}
// GJ_END
(in get_cat_list)

that took a lot of time since some 4k album ids got returned (again: my gallery setup is very simple - don;t do this if you have categories!)

for the random function I did this: (in functions.inc.php)


               if ($ALBUM_SET) {
                       $result = db_query("SELECT COUNT(*) from {$CONFIG['TABLE_PICTURES']} WHERE approved = 'YES' $ALBUM_SET");
               }
               else { // frontpage 'quick hack' - NOTE: this is going to return the highest pid id which is most likely not the number of pics
                       $result = db_query("SELECT MAX(pid) from {$CONFIG['TABLE_PICTURES']}");
               }
...
...
               if($select_columns != '*') $select_columns .= ', aid';

if (!$ALBUM_SET) {
       $RANDOM_MULTIPLE = 2; // getting two times (at the most) as many random results as requested
       $sql = "SELECT $select_columns FROM {$CONFIG['TABLE_PICTURES']} WHERE pid IN (";
       for ($i=0; $i<$limit2*$RANDOM_MULTIPLE; $i++) {
               $sql .= ($i ? ',' : '') . rand(0, $pic_count);
       }
       $sql .= ") LIMIT $limit2";
       $result = db_query($sql);
}
else {
               // if we have more than 1000 pictures, we limit the number of picture returned
...
}
               $rowset = array();
               while($row = mysql_fetch_array($result)){
...


==> to speed up the random function I am simply doing this:
- getting the highest (pid) for the picture table
- getting results for twice (at the most) as many pics as requested by simply randomly selecting pids between 0... and max_pid - that usually returns $limit2 results as requested in 99% of all cases and it's very fast - (if it really returns 14 instead of 15 requested results once in a while... who cares...)
(once a gallery gets really huge it'll make sense to randmly select only pids with a modulo of the weekday to optimize caching in the DB/picserver)

OK, so this made the frontpage very fast - but once again only modify your code like this if you have a very simple gallery setup and if you don;t care about the 'approved' status (here, pics just get deleted right away if not 'approved' ;)
If you have a more complex setup you might use that as a suggestion where to start optimizing things...

Tarique Sani

Great!!
@Oasis - can you add to / refute any of the above
@GGallery -  can you profile the code and post what *really* is slow rather than what is percieved as slow OR obviously slow.  Use APD for profiling, we desperately need profiling data from large setups of coppermine.
SANIsoft PHP applications for E Biz

GGallery

jepp sorry for not going more into that  - I copy & pasted some of the queries into a mysql client to see what takes a lot of time (is there any way to get the time consumed per db in the debug mode?)

      $sql = "SELECT sum(hits) FROM {$CONFIG['TABLE_PICTURES']} as p ".
                'LEFT JOIN '.$CONFIG['TABLE_ALBUMS'].' as a '.
                'ON p.aid=a.aid '.
                'WHERE a.category>=0'.$pic_filter;

uhhh ;) that was the first one I deactivated, took about 1sec+ and my template doesnt need that anyways... all the galery stats are deact. if $cat < 1


          $result = db_query("SELECT aid FROM {$CONFIG['TABLE_ALBUMS']} as a WHERE category >= " . FIRST_USER_CAT.$album_filter);

that was the other long taking query (1 to 2 secs)... returned several k aids which also were then put into $ALBUM_SET and into the other queries in lastup,random,lastcom

in a nutshell:
- for the fp I am not bothering with collecting subcat data or any statistics... these parts are deactivated with:
if ($cat > 1) {
    // Add the albums in the current category to the album set
in get_cat_list

that and restricting the result sets with the is_recent=YES flag makes the frontpage flying fast and as scalable as the rest.

Also I just modified init.inc.php a little bit (line 427):
Quote
// Remove expired bans

if (defined('DB_INPUT_PHP')) {
$now = time();
db_query("DELETE FROM {$CONFIG['TABLE_BANNED']} WHERE expiry < $now");
...

I am not bothering with banned users as long as they are not trying to upload something (for ip bans iptables or firewall makes more sense anyways (ip bans should always be temporary btw... I once locked 10% of my users by banning an aol proxy... very dangerous)) - that saves an other 2 or 3 queries.

There is an other potential issue:
recording 'hits' in the picture table... a gallery getting a lot of uploads (every second or more) will run into ugly mysql table locks... a 'hits' table would probably make sense in the long run.

Anyways: love working with your guys code! it's a real pleassure... very well done.

Tarique Sani

In the long run MySQL will have row level locking on every server ;)
SANIsoft PHP applications for E Biz

GGallery

they better do, yes... because that's actually usually they biggest problem for big installations... slow queries can usually be spread over replicating servers or the results can be cached in the application somehow (mysql caching doesnt make sense anymore once there are too many updates)
So a 'stupid' update of e.g. a 'lastlogin' field in the user table can lock all the other reads on table user => clients are piling up, users hitting 'refresh' and the nightmare begins.
That's why I am pretty scared of recording the hits in pictures, also that there are varchars.
You can prevent locks with HANDLER or minimize them a little bit with LOW_PRIORITY but that's not that nice either.
So profiling of mysql queries can sometimes lead to wrong assumptions unless you record all of them in a real world installation (slow_queries_log can tell a lot and might all in a sudden show 15.34 secs for something simple such as 'UPDATE table set hits=hits+1' :)