Duplicate content means substantive blocks of material that exist more than once in the web – in other words, substantive blocks of material that can be accessed via more than one URL. For example, if you write a lengthy reply in someone else’s blog or in a forum, and then publish this reply as a post in your own blog (with or without changes), that’s duplicate content.
Some bloggers are unduly concerned about this: influenced by hearsay or by some of those strange “SEO” (search engine optimization) sites, they think that the occasional duplicate content on their blog is a cardinal sin and the wrath of Google will fall upon them…
Re those sites, here’s a terse reply by Mark, head of WP support:
|I wouldn’t believe anything written on any SEO blog. Ever.|
Check what Google has to say on the subject:
Despite all the above, some sites continue to perpetuate the myth; some would even have you believe that wp.com blogs, in particular, come with a grave inherent problem: category pages.
Laconic reply by Mark again:
|You are fine – it’s how all WP blogs work and Google likes the way we do it.|
Yes, technically your category and other index pages are duplicate content, because their URLs lead to the same content post URLs do. But no, that’s not a detriment (on the contrary, categorizing your posts intelligently may actually improve your standing, because of the links from/to the global wp.com tag pages). What really happens is that when someone searches for words or phrases that would bring up a post of yours, Google will simply fetch it in one of the possible ways, dumping away the rest into the supplemental index. And that’s the way it should be: if you’re searching for something, you’d rather see ten different relevant articles on the first page of your Google search results, not the same article served ten times via its various categories and tags.
Most of what you can read re SEO refers to self-hosted blogs and other sites; wp.com blogs are as SEO friendly as it gets (for a variety of reasons, including built-in sitemaps and standardized URLs). And the truth about duplicate content is that if you are a normal blogger —that is, if you don’t systematically post the same articles in two different sites, you don’t systematically re-publish older posts, you don’t systematically post articles copied from other online sources, and you don’t try to deceive users and search engines— then you needn’t be concerned about this alleged problem.
1, 2: From Google Webmaster Central: Duplicate content summit at SMX Advanced.
3, 5: From Google Webmaster Central: Duplicate content due to scrapers.
4, 6: From Google Webmaster Central: Deftly dealing with duplicate content.