JavaScript Editor Ajax software     Free javascripts 



Main Page

One last example:
User-agent: googlebot
Disallow:
User-agent: *
Disallow: /
These rules would only allow Google to spider your site. This is because the more specific rule for
googlebot overrides the rule for
*
.
We recommend that webmasters place the exclusions for the default rule,
*
, last. According to the stan-
dard, this should not matter. However, there is some ambiguity as to whether a web spider picks the
first
matching rule, or the
most specific
matching rule. In the former case, if the
*
rule is placed first, it could
be applied. Listing the
*
rules last removes that ambiguity.
Generating robots.txt On-the-Fly
Nothing prevents a site developer from programmatically generating the
robots.txt
file on-the-fly,
dynamically. Include the following rule in
.htaccess
to map
robots.php
to
robots.txt
, and use
the
robots.php
script to generate it. In this fashion, you can use program logic similar to that used
for meta tag exclusion in order to generate a
robots.txt
file.
The following rule in
.htaccess
delegates the requests for
robots.txt
to
robots.php
:
RewriteEngine On
RewriteRule ^robots.txt$ /robots.php
The
robots.php
file could look like this:
<?
header(‘Content-type: text/plain’);
...
...
?>
# static parts of robots.txt can be added here
You will see a real-life example of generating
robots.txt
on the fly in Chapter 14.
Handling robots.txt Limitations
Suppose a site has a number of products at URLs that look like
/products.php?product_id=
<number>
, and a number of print-friendly product pages at the URL
/products.php?product_
id=<number>&print=1
.
A standard
robots.txt
file cannot be used to eliminate these print-friendly pages, because the
match has to be from the left. There would have to be a
robots.txt
entry for every page, and at
that point it degenerates into a case similar to meta tag exclusion. In that case it is simpler to use
meta-exclusion. Furthermore, it is reported that there is a limit of 5000 characters for a
robots.txt
102
Chapter 5: Duplicate Content
c05.qxd:c05 10:40 102


JavaScript Editor Ajax software     Free javascripts