In my previous post ‘Blogs are fundamentally flawed…‘ I noted an observation that more often than not search results would direct a user to an index-style page containing the post instead of directly to the ‘permalink’ location of the post. This leads to a poor user-experience from the visitor’s point of view, on busy blogs the post has almost certainly moved since the page was spider’d. Google in particular appeared to be the worst for it.
Discussions on the subject with Gerry determined that this is most likely down to Google’s PageRank technology; where index-style pages have a higher value than the post pages themselves. To get around this he suggested manipulating ‘robots.txt’ directives within the index-style pages.
On Google’s “Information for Webmasters” help page I found they look for special ‘robots.txt’ directives and meta tags in documents when spidering specific to Googlebot only. This meant I could single out Googlebot for these directives and not affect other search engines (which don’t exhibit the problem so much).
I basically want Google to ‘FOLLOW’ links on all pages, but not to ‘INDEX’ the index-style pages like categories & archives by date. The desired effect being that Google can find all posts as before but simply ignore the index-style pages themselves. Implementing this is quite simple; I modified my theme’s “header.php” file inserting the following code in the “head” section:
[php]< ?php
if ( !is_single() && !is_page() && !is_home() )
echo " \n”;
?>[/php]
This reads almost literally, if this is not a single post view, not a page view or the home page, add the following “meta…” tag. Although the home page is an index-style page I am reluctant to add ‘NOINDEX’ because I don’t want it disappearing from search results. 😉
Now the long wait for the changes to reflect in Google’s results.
Updated 24th January 2006 – Gerry pointed out this can be optimised using De Morgan’s Law 😛
[php]< ?php
if ( ! (is_single() || is_page() || is_home()) )
echo " \n”;
?>[/php]
Leave a Reply