weird url showing up weird url showing up
 

News:

CPG Release 1.6.26
Correct PHP8.2 issues with user and language managers.
Additional fixes for PHP 8.2
Correct PHP8 error with SMF 2.0 bridge.
Correct IPTC supplimental category parsing.
Download and info HERE

Main Menu

weird url showing up

Started by Walkinman, April 21, 2012, 09:34:11 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Walkinman

Hello

In my looking around to find why my cup usage is higher than it should be, I'm finding a lot of crawlers are hitting urls like this

http://example.com/stock/thumbnails-18-Whitetail-Deer-Photos.htmlhttp:/css/themes/water_drop/displayimage-search-0-260-ArctGrndSqrl_

clearly, the address (SEF plugin) for that page should be http://example.com/stock/thumbnails-18-Whitetail-Deer-Photos.html

but something is also sending a link to the extended (and wrong) url above. And they're crawling a lot of pages with that same kind of url

example.com/stock/thumbnails-18-Whitetail-Deer-Photos.htmlhttp:/css/themes/water_drop/displayimage-search-0-5503-Ski-tracks-in-snow-Wrangell-St-Elias-Natio.html

etc, etc

The only bots I see crawling that stuff are from China, particularly a baidu.com. I'm adding them to my blocked IP addresses, but I'm curious if maybe I have some thing coded incorrectly that's causing the above urls to be crawled.

Thank you.

Cheers

Carl

Walkinman

ETA: I also noticed that this weird url is ONLY showing up via one album:

http://www.skolaiimages.com/stock/thumbnails-18-Whitetail-Deer-Photos.html

It doesn't show up with any of the other albums.

I've blocked baidu from crawling my site, but am curious if anyone might have an idea what is generating that set of urls.

Thank you.

Cheers

Carl

Walkinman

"cup usage" should read "cpu usage" of course .. it'd be nice if at least some editing of posts were allowed.

Thanks.

Walkinman

hello - is it possible for an admin to PLEASE edit the first post here, and change the domain name to example.com ... I'm getting hammered by google .. over 5500 'file not found' entries and rising.

What's weird is that once the url-Whitetail-Deer-Photos.htmlhttp:/css/themes/ starts, it then tries to crawl the entire coppermine-gallery with that kind of thing.

I shouldn't have typed the correct domain name in the post. Please edit or delete it.

Thank you.

Αndré


Walkinman

Thanks so much, André. I shouldn't have been so stupid as to post it with the url.

What I don't understand is how a crawler accesses that one url, it then proceeds to try to crawl every link in the site with that string as the precedent. It'll put searches like "displayimage-search-0-260-ArctGrndSqrl_"and search every single keyword, and display a page for each one, with http://example.com/stock/thumbnails-18-Whitetail-Deer-Photos.htmlhttp as the first part of the string. All those pages will appear messed up, as the css doesn't apply correctly.

Thanks again for editing the post.

Cheers

Carl