coppermine-gallery.com/forum

No Support => General discussion (no support!) => Topic started by: Sogeri on October 02, 2003, 06:22:51 AM

Title: How to prevent entire site downloads
Post by: Sogeri on October 02, 2003, 06:22:51 AM
My site http://www.orchidspng.com gets an average of 25,000 page hits a day, some days peaking at over 100,000 page hits. From my web stats I can see what appears to be that some people are downloading the entire site. Yesterday's traffic was over 1GB! And that costs money. It is a hobby site.

Other than regulating access to the site for registered users only or via a throttle (I find it hard to guess what a reasonable number of hits per hour/day would be) or htaccess is there any other method to block a single IP from accessing the site to often within a given time frame.
Title: How to prevent entire site downloads
Post by: hyperion on October 02, 2003, 07:04:27 AM
Yes, but it could get complicated.  Basically, you store the IP addresses along with a timestamp.  You then delete the IP addresses as they exceed a certain time.  You then count the number of times an IP address is in the list (or increment a counter, etc.), and redirect to an explanation page when it exceeds the number of hits in the time frame. You put the call to the function at the begining of every page by placing it in the theme.php file.

Some of those downloaders might be spiders or robots that obey commands.  Use meta tags and robot files to try and keep them under control.

Great orchid shots, BTW. :)
Title: How to prevent entire site downloads
Post by: gtroll on October 02, 2003, 07:24:19 AM
Your site downloaders are probably using a bot, you can ban them in your .htaccess
http://www.webmasterworld.com/forum13/687.htm
Title: How to prevent entire site downloads
Post by: Jim on October 02, 2003, 07:36:04 AM
webmasterworld thread is for members only :(
Title: How to prevent entire site downloads
Post by: Sogeri on October 02, 2003, 07:40:15 AM
:D  Thanks for that. I will upload the .htaccess file as suggested.
Title: .htaccess stops bots
Post by: gtroll on October 02, 2003, 07:44:08 AM
Here you go Jim contents of the post there
Quote#From toolman of webmasterworld
<Files .htaccess>
deny from all
</Files>
RewriteEngine on
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*NEWT [OR]
RewriteCond %{HTTP_USER_AGENT} ^Crescent [OR]
RewriteCond %{HTTP_USER_AGENT} ^CherryPicker [OR]
RewriteCond %{HTTP_USER_AGENT} ^[Ww]eb[Bb]andit [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebEMailExtrac.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^NICErsPRO [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus.*Webster [OR]
RewriteCond %{HTTP_USER_AGENT} ^Microsoft.URL [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^LinkWalker [OR]
RewriteCond %{HTTP_USER_AGENT} ^sitecheck.internetseer.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^DIIbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^psbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailCollector
RewriteRule ^.* - [F]
RewriteCond %{HTTP_REFERER} ^http://www.your-site.com$
RewriteRule !^http://[^/.]\.your-site.com.* - [F]
Title: How to prevent entire site downloads
Post by: Tarique Sani on October 02, 2003, 08:05:18 AM
Don't want to be rain on the parade BUT spoofing of USER_AGENT is built into most new URL fetchers. I guess the correct way is to have Apache configured using mod_throttle OR mod_bandwidth.
Title: How to prevent entire site downloads
Post by: Sogeri on October 02, 2003, 08:20:47 AM
I found an even more extensive .htaccess file here:

http://tech.ratmachines.com/downloads/sample_wbmw.txt

So, which file would be best to use??
Title: Ok but...
Post by: epsilon on December 04, 2003, 05:01:58 PM
In which directory i must put this htaccess ? in albums dir only?
Title: Re: Ok but...
Post by: Joachim Müller on December 04, 2003, 07:11:27 PM
Quote from: "epsilon"In which directory i must put this htaccess ? in albums dir only?
yes
Title: More explicit... Please
Post by: epsilon on January 08, 2004, 03:21:41 AM
Don't want to be rain on the parade BUT spoofing of USER_AGENT is built into most new URL fetchers. I guess the correct way is to have Apache configured using mod_throttle OR mod_bandwidth.

How i can do it? i have on mod_rewrite to use the .htaccess commands, and when i will activate the throttle and bandwidth what i must do?

Thnks
Title: How to prevent entire site downloads
Post by: Tarique Sani on January 08, 2004, 04:21:31 AM
See http://www.snert.com/Software/mod_throttle/

I don't use it as I don't need it