JavaScript Editor Ajax software     Free javascripts 



Main Page

To exclude a page with meta-exclusion, simply place the following code in the
<head>
section of the
HTML document you want to exclude:
<meta name=”robots” content=”noindex, nofollow” />
This indicates that the page should not be indexed (
noindex
) and none of the links on the page should
be followed (
nofollow
). It is relatively easy to apply some simple programming logic to decide whether
or not to include such a meta tag on the pages of your site. It will always be applicable, so long as you
have access to the source code of the application, whereas
robots.txt
exclusion may be difficult or
even impossible to apply in certain cases.
To exclude a specific spider, change “robots” to the name of the spider — for example
googlebot
,
msnbot
, or
slurp
. To exclude multiple spiders, you can use multiple meta tags. For example, to
exclude
googlebot
and
msnbot
:
<meta name=”googlebot” content=”noindex, nofollow” />
<meta name=”msnbot” content=”noindex, nofollow” />
Table 5-1 shows the common user agent names used by the various major search engines.
In theory, this method is equivalent to the next method that is discussed,
robots.txt
. The only downside
is that the page must be fetched in order to determine that it should not be indexed in the first place. This is
likely to slow down indexing. Dan Thies also notes in
The Search Engine Marketing Kit
that “if your site serves
10 duplicate pages for every page of unique content, spiders may still give up indexing ... you can’t count
on the search engines to fish through your site looking for unique content.”
As mentioned, two technical limitations are associated with using the meta-exclusion method:
?
It requires access to the source code of the application. Otherwise, meta tag exclusion becomes
impossible because the tag must be placed in the web pages generated by the application.
?
It only works with HTML files, not with clear text, CSS, or binary/image files.
These limitations can be addressed by using the
robots.txt
file, which is discussed next. However,
robots.txt
also has some limitations as to its application. If you do not have access to the source code
of a web application, however,
robots.txt
is your only option.
Table 5-1
Search Engine
User Agent
Google
Googlebot
Yahoo!
Slurp
MSN Search
Msnbot
Ask
Te o m a
98
Chapter 5: Duplicate Content
c05.qxd:c05 10:40 98


JavaScript Editor Ajax software     Free javascripts