Resource Usage, bot blocking, etc Resource Usage, bot blocking, etc
 

News:

cpg1.5.48 Security release - upgrade mandatory!
The Coppermine development team is releasing a security update for Coppermine in order to counter a recently discovered vulnerability. It is important that all users who run version cpg1.5.46 or older update to this latest version as soon as possible.
[more]

Main Menu

Resource Usage, bot blocking, etc

Started by Walkinman, April 20, 2012, 09:42:42 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Walkinman

Hey Folks,

I'm trying to lower my CPU usage for my site, www.skolaiimages.com. I'm on a shared server and this is the 2nd webhost telling me my site is using too many resources on the CPU. The problem MIGHT be that I just need to move to a VPS server, but I don't think that's the issue. The bulk of my traffic is web bots, crawling around the site. I've setup a robots.txt file to block them from some of the extraneous wordpress files/directories, and am wondering if there are any suggestions for doing a similar thing with the coppermine section of the site.

I may well be barking up the wrong tree, but I'm at my wit's end in trying to figure this problem out.

I've done the usual, cache plugin on the wordpress platform, etc. I'm still fiddling with it a little, but thought I'd ask here if there is an advice for the coppermine stuff. Is this an issue with the way coppermine runs?

I'm running the SEF plugin, File Replacer, Add to Lightbox and Panorama Viewer plugins.

Are there any known sources of this problem on cpg? I've browsed the forum but not found anything.

I size all my images before uploading; and upload my own thumbnails.

The main things the webhost support folks have pointed to are (a) caching for wordpress and (b) bots crawling the coppermine section of the site.

Thanks so much for any help.

Cheers

Carl

Αndré

Coppermine adds
rel="nofollow"
to all meta album links out of the box. So the bots should just crawl your categories/albums/files. I don't know which performance penalty the SEF plugin introduces.

There's also a performance section in the docs: http://documentation.coppermine-gallery.net/en/performance.htm

Walkinman

hello Andre

Thank you for the post. I am looking over the performance page as well.

On the 'nofollow' code, here's an example of the the links that appear under an image via keywords:

<a href="thumbnails-search-65o4c.html&amp;keywords=on&amp;search=mountain">mountain</a> ;

There's 'nofollow' there .. I've blocked "thumbnails-search-65o4c.html" via htaccess .. which is a pain, because it'd be nice to have google index meta content. But the resource drain would seem to be intensive.

Is there a caching program at all for cpg, like the options available for wordpress?

Thank you.

Cheers

Carl

Αndré

I don't know which caching options Wordpress provides, but Coppermine has no built-in cache at all. I don't know which caching options are provided by Apache/IIS, PHP and MySQL and if they could be used easily without modifying Coppermine.

Walkinman

hey André

I can't help much, sorry. I just know there are many cache plugins available for wordpress, and speaking with 2 different webhosts, they say it's pretty much imperative to use one of those on a wordpress platform on a shared hosting environment; my host recommends http://wordpress.org/extend/plugins/wp-super-cache/

I looked over the link you gave me, thanks. Most things are set correctly.

The yslow tool is helpful, for sure. I wish I knew more about how to change the things it points to.

I tried to install the

<IfModule mod_expires.c>
    ExpiresActive on
    ExpiresDefault "access plus 2 weeks"
    ExpiresByType text/html "access plus 1 seconds"
</IfModule>


Do I drop that into the htaccess file for the skolaiimages.com/stock directory? or the root directory? (all the cpg-files are in /stock/)

I did that, but I still get "expired headers" results on the Yslow tool. Do I have to customize that code at all? Or just drop it in as is?

Thanks again.

Cheers

Carl

Αndré

As far as I know YSlow just helps you to load sites faster for the client (save traffic), but doesn't affect the server's CPU load.

Walkinman

Ahhh .. I wondered about that .. you just saved me countless hours of reading up about a bunch of stuff. Thank you.

It may not be the cpg component of the site at all, but the wordpress section. I have a primary domain, skolaiimages.com and an add on domain, expeditionsalaska.com .. both use wordpress for part of the site, and skolaiimages.com uses cpg as well. The cpg section of the site is easily the biggest part of the site/s and gets the most traffic. What I don't know is what, specifically, is driving up the cpu usage. My guess is the wordpress stuff, but I wanted to "leave no stone unturned" in examining the problem.

if anyone else has/had any similar problems, I'd be grateful to hear.

Thank you.

Cheers

Carl


Αndré

I don't know the Wordpress code, but I doubt a simple blog uses more resources than Coppermine with its meta albums and different permission possibilities.

In Coppermine's debug output you can find some performance information like
Page (performance)
------------------
Parameter        Current Peak   
Memory usage     7,56 MB 9,39 MB
Page generation  190 ms  190 ms
Page query time  46 ms   46 ms 
Page query count 67      67     


But I don't think that this will help you much. Maybe you find some slow queries further up in the debug output or in the MySQL logs.

Joe Carver

In my humble opinion...

    - "Powered by Coppermine" is virtually invisible on your pages

    - Showing 100 thumbs on your thumbnails / albums pages could create a resource load by itself

Also, you need to upgrade as soon as possible:
Quote<!--Coppermine Photo Gallery 1.5.12 (stable)-->


Walkinman

hey Joe

Thanks. I hadn't thought of the 100 thumbnails per page .. I'll look at that. I changed the css to make the powered by coppermine more prominent. It displayed well enough on my calibrated monitor, but it should be brighter now.

Yes, I've emailed the guy who made some core changes to my site a while ago about upgrading to the new version, and am waiting to hear back from him. If I try to do it myself, at this point with the tweaks to the code he made for me, I'll be dealing with a lot more issues than resource usage. But I am in the process of making those changes, thanks.

André - I've always thought that, too, just from the perspective of the scale of the directories, and the size of the relative sections of the site, that wordpress couldn't possible use as much resources. But I have no idea about such things.

I looked at the debug notice, and get this:



==========================
Page (performance)
------------------
Parameter        Current  Peak   
Memory usage     4.99 MiB 7.11 MiB
Page generation  100 ms   100 ms 
Page query time  2 ms     2 ms   
Page query count 30       30     

==========================
               


That looks close enough to what you posted to suggest it's not a problem, correct?

I think we may have found the culprit .. some old (now corrected) link structure problem is being crawled by bingbot.
Showing urls like this:

GET /Bio/alaska/bio/stock/stock/contact/bio/stock/thumbnails-22-Kenai-Peninsula-photos.html
GET /Bio/alaska/bio/stock/stock/stock/eagles/stock/stock/thumbnails-82-Muskox-photos.html
GET /Bio/alaska/bio/stock/stock/stock/eagles/stock/thumbnails-13-Grizzly-Bears-Photos.html

Thousands of them.

.... those pages don't exist, and are producing 404 errors. Because I have the site root based on wordpress, the 404 error is being called and created dynamically, every time. So I tend to suspect, at least for now, that's where the problem is (if indeed it's only ONE problem :) ). I pointed all those pages via a 301 redirect to a static page, so hopefully that solves that error. I should know some time tomorrow if the resource usage has lowered after changing that today.

No other bot crawls those urls, only bing/msn bot. They're from direct requests, not linked files.

I set a 301 redirect like this

RedirectMatch 301 ^/Bio/ http://www.skolaiimages.com/bio/index.html

so anything with that prefix goes to a static page. I don' tknow whether that's the best solution, or even A solution, but it seems like it should solve some things. If anyone who knows this stuff better than I do has a better solution, I'd be glad to listen.

I'll post back when I hear from the webhost about whether this has solved anything.

Thanks so much.

Cheers

Carl