SEO Experts Mumbai, India, Freelance Social Media Optimization, PPC

Duplicate Content Filter: What it is and how it works

Duplicate Content has become a huge topic of discussion lately, thanks to the new filters that search engines have implemented. This article will help you understand why you might be caught in the filter, and ways to avoid it. We'll also show you how you can determine if your pages have duplicate content, and what to do to fix it.

Search engine spam is any deceitful attempts to deliberately trick the search engine into returning inappropriate, redundant, or poor-quality search results. Many times this behavior is seen in pages that are exact replicas of other pages which are created to receive better results in the search engine. Many people assume that creating multiple or similar copies of the same page will either increase their chances of getting listed in search engines or help them get multiple listings, due to the presence of more keywords.

In order to make a search more relevant to a user, search engines use a filter that removes the duplicate content pages from the search results, and the spam along with it. Unfortunately, good, hardworking webmasters have fallen prey to the filters imposed by the search engines that remove duplicate content. It is those webmasters who unknowingly spam the search engines, when there are some things they can do to avoid being filtered out. In order for you to truly understand the concepts you can implement to avoid the duplicate content filter, you need to know how this filter works.

First, we must understand that the term "duplicate content penalty" is actually a misnomer. When we refer to penalties in search engine rankings, we are actually talking about points that are deducted from a page in order to come to an overall relevancy score. But in reality, duplicate content pages are not penalized. Rather they are simply filtered, the way you would use a sieve to remove unwanted particles. Sometimes, "good particles" are accidentally filtered out.

Knowing the difference between the filter and the penalty, you can now understand how a search engine determines what duplicate content is. There are basically four types of duplicate content that are filtered out:

1. Websites with Identical Pages - These pages are considered duplicate, as well as websites that are identical to another website on the Internet are also considered to be spam. Affiliate sites with the same look and feel which contain identical content, for example, are especially vulnerable to a duplicate content filter. Another example would be a website with doorway pages. Many times, these doorways are skewed versions of landing pages. However, these landing pages are identical to other landing pages. Generally, doorway pages are intended to be used to spam the search engines in order to manipulate search engine results.

2. Scraped Content - Scraped content is taking content from a web site and repackaging it to make it look different, but in essence it is nothing more than a duplicate page. With the popularity of blogs on the internet and the syndication of those blogs, scraping is becoming more of a problem for search engines.

3. E-Commerce Product Descriptions - Many eCommerce sites out there use the manufacturer's descriptions for the products, which hundreds or thousands of other eCommerce stores in the same competitive markets are using too. This duplicate content, while harder to spot, is still considered spam.

4. Distribution of Articles - If you publish an article, and it gets copied and put all over the Internet, this is good, right? Not necessarily for all the sites that feature the same article. This type of duplicate content can be tricky, because even though Yahoo and MSN determine the source of the original article and deems it most relevant in search results, other search engines like Google may not, according to some experts.

So, how does a search engine's duplicate content filter work? Essentially, when a search engine robot crawls a website, it reads the pages, and stores the information in its database. Then, it compares its findings to other information it has in its database. Depending upon a few factors, such as the overall relevancy score of a website, it then determines which are duplicate content, and then filters out the pages or the websites that qualify as spam. Unfortunately, if your pages are not spam, but have enough similar content, they may still be regarded as spam.

There are several things you can do to avoid the duplicate content filter. First, you must be able to check your pages for duplicate content. Using our Similar Page Checker, you will be able to determine similarity between two pages and make them as unique as possible. By entering the URLs of two pages, this tool will compare those pages, and point out how they are similar so that you can make them unique.

Since you need to know which sites might have copied your site or pages, you will need some help. We recommend using a tool that searches for copies of your page on the Internet: www.copyscape.com. Here, you can put in your web page URL to find replicas of your page on the Internet. This can help you create unique content, or even address the issue of someone "borrowing" your content without your permission.

Let's look at the issue regarding some search engines possibly not considering the source of the original content from distributed articles. Remember, some search engines, like Google, use link popularity to determine the most relevant results. Continue to build your link popularity, while using tools like www.copyscape.com to find how many other sites have the same article, and if allowed by the author, you may be able to alter the article as to make the content unique.

If you use distributed articles for your content, consider how relevant the article is to your overall web page and then to the site as a whole. Sometimes, simply adding your own commentary to the articles can be enough to avoid the duplicate content filter; the Similar Page Checker could help you make your content unique. Further, the more relevant articles you can add to compliment the first article, the better. Search engines look at the entire web page and its relationship to the whole site, so as long as you aren't exactly copying someone's pages, you should be fine.

If you have an eCommerce site, you should write original descriptions for your products. This can be hard to do if you have many products, but it really is necessary if you wish to avoid the duplicate content filter. Here's another example why using the Similar Page Checker is a great idea. It can tell you how you can change your descriptions so as to have unique and original content for your site. This also works well for scraped content also. Many scraped content sites offer news. With the Similar Page Checker, you can easily determine where the news content is similar, and then change it to make it unique.

Do not rely on an affiliate site which is identical to other sites or create identical doorway pages. These types of behaviors are not only filtered out immediately as spam, but there is generally no comparison of the page to the site as a whole if another site or page is found as duplicate, and get your entire site in trouble.

The duplicate content filter is sometimes hard on sites that don't intend to spam the search engines. But it is ultimately up to you to help the search engines determine that your site is as unique as possible. By using the tools in this article to eliminate as much duplicate content as you can, you'll help keep your site original and fresh.

The 10 Great SEO tips for your site

1 Content

This is the number one for any search marketing strategy, it is impossibly important to ensure that you have content worth viewing. Without this one simply step to ensure that there is a reason for someone to be on your site, everything else is useless. There are a lot of great sites to find inspiration for writing great content that works.

2 Incoming Links

A link is a link is a link, but without the simplest form you aren’t going to do well in search engines. The more links you have the more often you are going to be crawled. It is also important to make sure that you have the proper anchor text for your incoming links. The easiest way to gain quality links from other sites is to link to sites to let them know your site is there and hope for a reciprocal link. It is also important to make sure that you have content that is worth linking to on your site.

3 Web site title

Making sure that you have the right web site titles for your pages is extremely important. The keywords you place in your title are important in order to ensure that your topic is understood by Google. One of the primary factors for ranking is if the title is on-topic with the search results. Not only is it important for robots to index and understand the topic of the page either. It is important for click-through rates in the search results. Pay attention to what you click on when you are searching in Google, I know that I don’t always click the first results. Using great titles and topics on your site will bring you more traffic than a number one listing. Most of the time it is within the first page, but I skim through the titles to see which looks to be more on-topic for my search query.

4 Heading tags

When you are laying out your site’s content you have to be sure that you are creating the content flow in such a way that the heading tags are based on prominence. The most prominent of course being the h1 tag, which says this is what this block of copy is about. Making sure you understand heading tag structure is very important. You only want to have one (or two) h1 tags per a page. It is important to not just throw anything into an h1 tag and hope you rank for it.

5 Internal Linking

Making sure that your internal linking helps robots (and visitors!) to find the content on your site is huge. Using relevant copy throughout your site will tell the robots (and visitors!) more effectively what to expect on the corresponding page. You do want to make sure that on pages you don’t want to rank in Google that you add a nofollow tag to ensure that the ranking flow of your site corresponds with your site’s topic and interests. No one is going to be searching Google to find out what your terms of service or privacy policy are.

6 Keyword Density

Ensuring that you have the right keyword density for your page and sites topic is paramount. You don’t want to go overboard and use the keyword every 5th word but making sure it comes up often is going to help you rank better in search engines. The unspoken rule is no more than 5% of the total copy per a page. Anymore then this and it can start to look a little spammy. Granted, you aren’t shooting for 5% every time. It is really all about context and relevance just make sure it is good, quality copy.

7 Sitemaps

It is always a good idea to give search engines a helping hand to find the content that is on your site. Making sure that you create and maintain a sitemap for all of the pages on your site will help the search robots to find all of the pages in your site and index them. Google, Yahoo, MSN and Ask all support sitemaps and most of them offer a great way to ensure that it is finding your sitemap. Most of the time you can simply name it sitemap.xml and the search robot will find the file effectively.

8 Meta Tags

Everyone will tell you that meta tags don’t matter, they do. The biggest thing they matter for is click-through though. There will be a lot of times when Google will use your meta description as the copy that gets pulled with your search listing. This can help to attract the visitor to visit your web site if it is related to their search query. Definitely a much overlooked (as of late) ranking factor. Getting indexed by search engines and ranking well is just the first step. The next, and biggest, step is getting that visitor that searched for your keywords to want to click on your search listing.

9 URL Structure

Ensuring that your URL structure compliments the content that is on the corresponding page is pretty important. There are various methods to make this work, such as moderate on apache.

10 Domain

It can help to have keywords you are interested in ranking for within your domain, but only as much as the title, heading and content matters. One very important factor that is coming to light is that domain age is important. The older the site or domain, the better it is not spam and can do well in search results. The domain age definitely isn’t a make or break factor but it does help quite a bit.