Ajax software
Free javascripts
↑
Main Page
sites as well — and will sometimes make the wrong assumption as to which instance of the content is
the original, authoritative one.
This is an insidious problem in some cases, and can have a disastrous effect on rankings. CopyScape
(
http://www.copyscape.com
) is a service that helps you find content thieves by scanning for similar
content contained by a given page on other pages. Sitemaps can also offer help by getting new content
indexed more quickly and therefore removing the ambiguity as to who is the original author. Sitemaps
are discussed at length in Chapter 9.
If you are a victim of content theft, and want to take action, first present the individual using the content
illicitly with a cease and desist letter. Use the contact information provided on his web site or in the WHOIS
record of the domain name. Failing that, the major search engines have procedures to alert them of stolen
content. Here are URLs with the directions for the major search engines:
?
Google:
http://www.google.com/dmca.html
?
Yahoo!:
http://docs.yahoo.com/info/copyright/copyright.html
?
MSN:
http://search.msn.com/docs/siteowner.aspx?t=SEARCH_WEBMASTER_CONC_
AboutDMCA.htm
Unfortunately, fighting content theft is ridiculously time-consuming and expensive — especially if
lawyers get involved. Doing so for all instances is probably unrealistic; and search engines generally
do
accurately assess who is the original author and display that one preferentially. In Google, the illicit
duplicates are typically relegated to the supplemental index. However, it may be necessary to take this
action in the unlikely case that the URLs with the stolen content actually rank better than yours.
Excluding Duplicate Content
When you have duplicate content on your site, you can remove it entirely by altering the architecture of
a web site. But sometimes a web site
has
to contain duplicate content. The most typical scenario of this is
when the business rules that drive the web site require the said duplicate content.
To address this, you can simply exclude it from the view of a search engine. Here are the two ways of
excluding pages:
?
Using the
robots
meta tag
?
robots.txt
pattern exclusion
In the following sections, you learn about the
robots
meta tag and about
robots.txt
.
Using the Robots Meta Tag
This is addressed first, not because it’s universally the optimal way to exclude content, but rather because
it has virtually no limitations as to its application. Using the
robots
meta tag you can exclude any HTML-
based content from a web site on a page-by-page basis, and it is frequently an easier method to use when
eliminating duplicate content from a preexisting site for which the source code is available, or when a site
contains many complex dynamic URLs.
97
Chapter 5: Duplicate Content
c05.qxd:c05 10:40 97
Ajax software
Free javascripts
→