Adding meta tag for robots in secondary pages Adding meta tag for robots in secondary pages
 

News:

cpg1.5.48 Security release - upgrade mandatory!
The Coppermine development team is releasing a security update for Coppermine in order to counter a recently discovered vulnerability. It is important that all users who run version cpg1.5.46 or older update to this latest version as soon as possible.
[more]

Main Menu

Adding meta tag for robots in secondary pages

Started by mlduclos, June 18, 2006, 01:22:29 AM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

mlduclos

Hello

Is there a way to tell crawler bots such Google and Yahoo! Slurp to not index secondary pages of gallery? i.e make crawdable only the indexes and the pictures pages, and not login, theme variation, polls, votes etc.

How to optimize the gallery to Search Engines? Because having 700 albums with about 1500 pictures you get a huge amount of possible variables in URLs. Can someone point where I can get this information? I'm thinking in add the meta tag <META NAME="ROBOTS" CONTENT="NOINDEX,FOLLOW"> in the appropriated pages.

Thanks!

Joachim Müller

You're correct - you can accomplish this by adding the suggested meta tag to your non-index pages. The meta tags for each page are handed over to the function pageheader that is used to create the meta tag. You'll have to add your custom content by adding the "noindex, nofollow" stuff to the section where the meta data are handed over to the function. The files that are spiderable are basically thumbnails.php, displayimage.php, upload.php, login.php. As an example, I'll explain how this could be done editing displayimage.php:
Findpageheader($album_name . '/' . $picture_title, $meta_keywords, false);and replace with$meta_keep_robots_away = '<meta name="robots" content="noindex,follow" />';
pageheader($album_name . '/' . $picture_title, $meta_keywords . $meta_keep_robots_away, false);
You get the idea, don't you?
This method has the drawback that you will have to touch a lot of files and modify then; subsequently having to edit those files again each time you update, so this is not very elegant. The more elegant approach is to deny spidering for all pages except the index page. To accomplish this, edit themes/yourtheme/theme.php and findfunction pageheader($section, $meta = '')
{
    global $CONFIG, $THEME_DIR;
    global $template_header, $lang_charset, $lang_text_dir;

    $custom_header = cpg_get_custom_include($CONFIG['custom_header_path']);

        $charset = ($CONFIG['charset'] == 'language file') ? $lang_charset : $CONFIG['charset'];

    header('P3P: CP="CAO DSP COR CURa ADMa DEVa OUR IND PHY ONL UNI COM NAV INT DEM PRE"');
        header("Content-Type: text/html; charset=$charset");
    user_save_profile();

    $template_vars = array('{LANG_DIR}' => $lang_text_dir,
        '{TITLE}' => $CONFIG['gallery_name'] . ' - ' . strip_tags(bb_decode($section)),
        '{CHARSET}' => $charset,
        '{META}' => $meta,
        '{GAL_NAME}' => $CONFIG['gallery_name'],
        '{GAL_DESCRIPTION}' => $CONFIG['gallery_description'],
        '{SYS_MENU}' => theme_main_menu('sys_menu'),
        '{SUB_MENU}' => theme_main_menu('sub_menu'),
        '{ADMIN_MENU}' => theme_admin_mode_menu(),
        '{CUSTOM_HEADER}' => $custom_header,
        );

    echo template_eval($template_header, $template_vars);
}
If this section is not there, copy it into themes/yourtheme/theme.php into a new line right before ?>Then edit it in this way: replace the code withfunction pageheader($section, $meta = '')
{
    global $CONFIG, $THEME_DIR;
    global $template_header, $lang_charset, $lang_text_dir;

    $custom_header = cpg_get_custom_include($CONFIG['custom_header_path']);

        $charset = ($CONFIG['charset'] == 'language file') ? $lang_charset : $CONFIG['charset'];

    header('P3P: CP="CAO DSP COR CURa ADMa DEVa OUR IND PHY ONL UNI COM NAV INT DEM PRE"');
        header("Content-Type: text/html; charset=$charset");
    user_save_profile();

    if ($meta != '<meta name="robots" content="index,follow" />') {
        $meta .= "\n\r".'<meta name="robots" content="noindex,nofollow" />';
    }

    $template_vars = array('{LANG_DIR}' => $lang_text_dir,
        '{TITLE}' => $CONFIG['gallery_name'] . ' - ' . strip_tags(bb_decode($section)),
        '{CHARSET}' => $charset,
        '{META}' => $meta,
        '{GAL_NAME}' => $CONFIG['gallery_name'],
        '{GAL_DESCRIPTION}' => $CONFIG['gallery_description'],
        '{SYS_MENU}' => theme_main_menu('sys_menu'),
        '{SUB_MENU}' => theme_main_menu('sub_menu'),
        '{ADMIN_MENU}' => theme_admin_mode_menu(),
        '{CUSTOM_HEADER}' => $custom_header,
        );

    echo template_eval($template_header, $template_vars);
}
The added code will check if the pageheader function is being called with an explicit command to allow spidering. If this command isn't given, it will disallow spidering. All that is left to do is adding the code to explicitely allow spidering for the index page. To do so, edit index.php, find    pageheader($BREADCRUMB_TEXT ? $BREADCRUMB_TEXT : $lang_index_php['welcome']);and replace with    if ($BREADCRUMB_TEXT != '') {
        pageheader($BREADCRUMB_TEXT, '<meta name="robots" content="index,follow" />');
    } else {
        pageheader($lang_index_php['welcome'], '<meta name="robots" content="index,follow" />');
    }

Save your changes, upload and test-drive. Please confirm.

mlduclos

Hello

Thanks for the great support!

I made these modifications and seems to work. I allow the spider in index.php and thumbnails.php . Thumbnails still have a lot of possible secondary variations, but I think its important to crawl each album name. I will now tell google to revisit the gallery via console and lets see what happens.

Its annoying to me see in server logs the incessant movement of crawlers visiting the whole site daily, because my page arent that dynamic, they basically dont change, except the forum. And Google have about 17 . 000 pages indexed in my gallery (which have "only" 363 albums with 837 pictures). If you search the gallery, google display that message

"In order to show you the most relevant results, we have omitted some entries very similar to the 3 already displayed.If you like, you can repeat the search with the omitted results included."... Show only 3 pages. I think it dont send much visitors for gallery.

Thanks!!

Joachim Müller

Make the spider not follow links to meta albums etc. - you can do so by diving into theme.php and change the links you don't want to see spidered by changing <a href="foo">to<a href="foo" rel="nofollow">

twistedcain

I just started using the gallery recently, and only have about 300 files uploaded. I ran XENU's Link Sleuth to check for bad links and build a sitemap and it turned up over 6,800 links. So, yes, a lot of duplicate pages. I wasn't sure where to begin, so thanks for pointing out the pageheader function.

Here is how I set it up to get rid of duplicate pages. I have only briefly tested it, but it seems to work. Would be grateful if anyone pointed out any mistakes I made or problems with the implementation.

Step 1: The first thing I needed was to grab the php_self and php_request variables,

$php_self = $_SERVER["PHP_SELF"];
$php_request = $_SERVER["REQUEST_URI"];


Step 2: Next I needed to check for pages I wanted to be indexed and followed,

Index.php includes the home page and the category pages, so all of these need to be indexed and followed,

$php_self == 'index.php'

I want the search page to be indexed, this of course is optional,

$php_self == 'search.php'

The following will check for strictly the album pages,

preg_match( '/^.thumbnails.php.album.[0-9]+$/', $php_request )

If an album has multiple pages, I want to index the first page ( i.e. /thumbnails.php?album=1 ).
I don't want to index the first pages duplicate in a multi-page album ( i.e. /thumbnails.php?album=1&page=1 ).
I also need to index the other pages ( i.e. /thumbnails.php?album=1&page=5 ).

preg_match( '/.thumbnails.php.album.[0-9]+.page.([2-9]{1}|[0-9]{2,})/', $php_request )

Next I need to index the files,

preg_match( '/.displayimage.php.pos..[0-9]+/', $php_request ) )   

Step 3: Now I need to check for pages I don't want indexed or followed (to keep the web crawlers from wasting bandwidth crawling unindexed duplicate pages).

Check for thumbnail.php duplicate albums; lastup, lastcom, topn, toprated, favpics, and search,

preg_match( '/.thumbnails.php.album.[lastup|lastcom|topn|toprated|favpics|search]/', $php_request )

Tell it not to index ratepic.php or addfav.php,

preg_match( '/.[ratepic|addfav].php/', $php_request )

Step 4: Anything that isn't included above will be followed but not indexed.

Here is the code, just add it to or modify it in your "theme.php" file,

// Function for writing a pageheader
function pageheader($section, $meta = '')
{
    global $CONFIG, $THEME_DIR;
    global $template_header, $lang_charset, $lang_text_dir;

    $php_self = $_SERVER["PHP_SELF"];
    $php_request = $_SERVER["REQUEST_URI"];

    if ($php_self == 'index.php' ||
        $php_self == 'search.php' ||
preg_match( '/^.thumbnails.php.album.[0-9]+$/', $php_request ) ||
preg_match( '/.thumbnails.php.album.[0-9]+.page.([2-9]{1}|[0-9]{2,})/', $php_request ) ||
        preg_match( '/.displayimage.php.pos..[0-9]+/', $php_request ))
    {
$meta .= '<meta name="robots" content="index,follow" />'."\n";
    }
    else if (preg_match( '/.thumbnails.php.album.[lastup|lastcom|topn|toprated|favpics|search]/', $php_request) ||
      preg_match( '/.[ratepic|addfav].php/', $php_request ))
    {
        $meta .= '<meta name="robots" content="noindex,nofollow" />'."\n";
    }
    else {
$meta .= '<meta name="robots" content="noindex,follow" />'."\n";
    }

    $custom_header = cpg_get_custom_include($CONFIG['custom_header_path']);

    $charset = ($CONFIG['charset'] == 'language file') ? $lang_charset : $CONFIG['charset'];

    header('P3P: CP="CAO DSP COR CURa ADMa DEVa OUR IND PHY ONL UNI COM NAV INT DEM PRE"');
        header("Content-Type: text/html; charset=$charset");
    user_save_profile();

    $template_vars = array('{LANG_DIR}' => $lang_text_dir,
        '{TITLE}' => $CONFIG['gallery_name'] . ' - ' . strip_tags(bb_decode($section)),
        '{CHARSET}' => $charset,
        '{META}' => $meta,
        '{GAL_NAME}' => $CONFIG['gallery_name'],
        '{GAL_DESCRIPTION}' => $CONFIG['gallery_description'],
        '{SYS_MENU}' => theme_main_menu('sys_menu'),
        '{SUB_MENU}' => theme_main_menu('sub_menu'),
        '{ADMIN_MENU}' => theme_admin_mode_menu(),
        '{CUSTOM_HEADER}' => $custom_header,
        );

    echo template_eval($template_header, $template_vars);
}



I have one more concern about duplication issues, although I should probably save it for a separate thread. Google recently added a section for their webmaster tools that shows pages that have duplicate titles. I was going to suggest that when a person is uploading images that the CPG software warn them if a duplicate title already exists in the database. Not forcing a person to change it, but warning them that they will have 2 pages that feature the same title. Just a thought.

twistedcain

Can't edit my own posts? Anyway, I left a huge hole in my above code (probably one of many).

The sorting options TITLE + - FILE NAME + - DAT + - POSITION + under my above code would all be indexed, adding a multitude of duplicate pages.

To fix it, I need to change the following line from,

preg_match( '/.thumbnails.php.album.[0-9]+.page.([2-9]{1}|[0-9]{2,})/', $php_request

to,

preg_match( '/^.thumbnails.php.album.[0-9]+.page.([2-9]{1}|[0-9]{2,})$/', $php_request

Even better would be to add the following line to the nofollow,noindex section, since there is really no need for the crawler to index the sort pages,

preg_match( '/.thumbnails.php.album.[0-9]+.page.[0-9]+.sort/', $php_request )

Here is the updated code, goes in your "theme.php",

// Function for writing a pageheader
function pageheader($section, $meta = '')
{
    global $CONFIG, $THEME_DIR;
    global $template_header, $lang_charset, $lang_text_dir;

    $php_self = $_SERVER["PHP_SELF"];
    $php_request = $_SERVER["REQUEST_URI"];

    if( $php_self == 'index.php' ||
        $php_self == 'search.php' ||
preg_match( '/^.thumbnails.php.album.[0-9]+$/', $php_request ) ||
preg_match( '/^.thumbnails.php.album.[0-9]+.page.([2-9]{1}|[0-9]{2,})$/', $php_request ) ||
preg_match( '/.displayimage.php.pos..[0-9]+/', $php_request ) )
    {
$meta .= '<meta name="robots" content="index,follow" />'."\n";
    }
    else if( preg_match( '/.thumbnails.php.album.[lastup|lastcom|topn|toprated|favpics|search]/', $php_request ) ||
preg_match( '/.thumbnails.php.album.[0-9]+.page.[0-9]+.sort/', $php_request ) ||
preg_match( '/.[ratepic|addfav].php/', $php_request ) )
    {
$meta .= '<meta name="robots" content="noindex,nofollow" />'."\n";
    }
    else {
$meta .= '<meta name="robots" content="noindex,follow" />'."\n";
    }

    $custom_header = cpg_get_custom_include($CONFIG['custom_header_path']);

    $charset = ($CONFIG['charset'] == 'language file') ? $lang_charset : $CONFIG['charset'];

    header('P3P: CP="CAO DSP COR CURa ADMa DEVa OUR IND PHY ONL UNI COM NAV INT DEM PRE"');
        header("Content-Type: text/html; charset=$charset");
    user_save_profile();

    $template_vars = array('{LANG_DIR}' => $lang_text_dir,
        '{TITLE}' => $CONFIG['gallery_name'] . ' - ' . strip_tags(bb_decode($section)),
        '{CHARSET}' => $charset,
        '{META}' => $meta,
        '{GAL_NAME}' => $CONFIG['gallery_name'],
        '{GAL_DESCRIPTION}' => $CONFIG['gallery_description'],
        '{SYS_MENU}' => theme_main_menu('sys_menu'),
        '{SUB_MENU}' => theme_main_menu('sub_menu'),
        '{ADMIN_MENU}' => theme_admin_mode_menu(),
        '{CUSTOM_HEADER}' => $custom_header,
        );

    echo template_eval($template_header, $template_vars);
}