Professional Search Engine Optimization (Seo). Developer’s Guide to SEO

Ajax software Free javascripts
↑

Main Page

sites as well — and will sometimes make the wrong assumption as to which instance of the content is

the original, authoritative one.

This is an insidious problem in some cases, and can have a disastrous effect on rankings. CopyScape

(

http://www.copyscape.com

) is a service that helps you find content thieves by scanning for similar

content contained by a given page on other pages. Sitemaps can also offer help by getting new content

indexed more quickly and therefore removing the ambiguity as to who is the original author. Sitemaps

are discussed at length in Chapter 9.

If you are a victim of content theft, and want to take action, first present the individual using the content

illicitly with a cease and desist letter. Use the contact information provided on his web site or in the WHOIS

record of the domain name. Failing that, the major search engines have procedures to alert them of stolen

content. Here are URLs with the directions for the major search engines:

Google:

http://www.google.com/dmca.html

Yahoo!:

http://docs.yahoo.com/info/copyright/copyright.html

MSN:

http://search.msn.com/docs/siteowner.aspx?t=SEARCH_WEBMASTER_CONC_

AboutDMCA.htm

Unfortunately, fighting content theft is ridiculously time-consuming and expensive — especially if

lawyers get involved. Doing so for all instances is probably unrealistic; and search engines generally

accurately assess who is the original author and display that one preferentially. In Google, the illicit

duplicates are typically relegated to the supplemental index. However, it may be necessary to take this

action in the unlikely case that the URLs with the stolen content actually rank better than yours.

Excluding Duplicate Content

When you have duplicate content on your site, you can remove it entirely by altering the architecture of

a web site. But sometimes a web site

has

to contain duplicate content. The most typical scenario of this is

when the business rules that drive the web site require the said duplicate content.

To address this, you can simply exclude it from the view of a search engine. Here are the two ways of

excluding pages:

Using the

robots

meta tag

robots.txt

pattern exclusion

In the following sections, you learn about the

robots

meta tag and about

robots.txt

Using the Robots Meta Tag

This is addressed first, not because it’s universally the optimal way to exclude content, but rather because

it has virtually no limitations as to its application. Using the

robots

meta tag you can exclude any HTML-

based content from a web site on a page-by-page basis, and it is frequently an easier method to use when

eliminating duplicate content from a preexisting site for which the source code is available, or when a site

contains many complex dynamic URLs.

Chapter 5: Duplicate Content

c05.qxd:c05 10:40 97

Ajax software Free javascripts
→