Professional Search Engine Optimization (Seo). Developer’s Guide to SEO

Ajax software Free javascripts
↑

Main Page

To exclude a page with meta-exclusion, simply place the following code in the

<head>

section of the

HTML document you want to exclude:

This indicates that the page should not be indexed (

noindex

) and none of the links on the page should

be followed (

nofollow

). It is relatively easy to apply some simple programming logic to decide whether

or not to include such a meta tag on the pages of your site. It will always be applicable, so long as you

have access to the source code of the application, whereas

robots.txt

exclusion may be difficult or

even impossible to apply in certain cases.

To exclude a specific spider, change “robots” to the name of the spider — for example

googlebot

msnbot

, or

slurp

. To exclude multiple spiders, you can use multiple meta tags. For example, to

exclude

googlebot

and

msnbot

Table 5-1 shows the common user agent names used by the various major search engines.

In theory, this method is equivalent to the next method that is discussed,

robots.txt

. The only downside

is that the page must be fetched in order to determine that it should not be indexed in the first place. This is

likely to slow down indexing. Dan Thies also notes in

The Search Engine Marketing Kit

that “if your site serves

10 duplicate pages for every page of unique content, spiders may still give up indexing ... you can’t count

on the search engines to fish through your site looking for unique content.”

As mentioned, two technical limitations are associated with using the meta-exclusion method:

It requires access to the source code of the application. Otherwise, meta tag exclusion becomes

impossible because the tag must be placed in the web pages generated by the application.

It only works with HTML files, not with clear text, CSS, or binary/image files.

These limitations can be addressed by using the

robots.txt

file, which is discussed next. However,

robots.txt

also has some limitations as to its application. If you do not have access to the source code

of a web application, however,

robots.txt

is your only option.

Table 5-1

Search Engine

User Agent

Google

Googlebot

Yahoo!

Slurp

MSN Search

Msnbot

Ask

Te o m a

Chapter 5: Duplicate Content

c05.qxd:c05 10:40 98

Ajax software Free javascripts
→