you're reading:
Category pages, Duplicate content, SEO

On the duplicate content myth

http://wpbtips.wordpress.com/

Duplicate content means substantive blocks of material that exist more than once in the web – in other words, substantive blocks of material that can be accessed via more than one URL. For example, if you write a lengthy reply in someone else’s blog or in a forum, and then publish this reply as a post in your own blog (with or without changes), that’s duplicate content.

Some bloggers are unduly concerned about this: influenced by hearsay or by some of those strange “SEO” (search engine optimization) sites, they think that the occasional duplicate content on their blog is a cardinal sin and the wrath of Google will fall upon them…

Re those sites, here’s a terse reply by Mark, head of WP support:

I wouldn’t believe anything written on any SEO blog. Ever.

Check what Google has to say on the subject:

Google wants to serve up unique results and does a great job of picking a version of your content to show if your site includes duplication. If you don’t want to worry about sorting through duplication on your site, you can let us worry about it instead. [1]

Duplicate content doesn’t cause your site to be penalized. If duplicate pages are detected, one version will be returned in the search results to ensure variety for searchers. [2]

In the majority of cases, having duplicate content does not have negative effects on your site’s presence in the Google index. [3]

In the vast majority of cases, the worst thing that’ll befall webmasters is to see the “less desired” version of a page shown in our index. [4]

Only when there are signals pointing to deliberate and malicious intent, occurrences of duplicate content might be considered a violation of the webmaster guidelines. [5]

In the rare cases in which we perceive that duplicate content may be shown with intent to manipulate our rankings and deceive our users, we’ll also make appropriate adjustments in the indexing and ranking of the sites involved. However, we prefer to focus on filtering rather than ranking adjustments. [6]

Despite all the above, some sites continue to perpetuate the myth; some would even have you believe that wp.com blogs, in particular, come with a grave inherent problem: category pages.

Laconic reply by Mark again:

You are fine – it’s how all WP blogs work and Google likes the way we do it.

Yes, technically your category and other index pages are duplicate content, because their URLs lead to the same content post URLs do. But no, that’s not a detriment (on the contrary, categorizing your posts intelligently may actually improve your standing, because of the links from/to the global wp.com tag pages). What really happens is that when someone searches for words or phrases that would bring up a post of yours, Google will simply fetch it in one of the possible ways, dumping away the rest into the supplemental index. And that’s the way it should be: if you’re searching for something, you’d rather see ten different relevant articles on the first page of your Google search results, not the same article served ten times via its various categories and tags.

So…
Most of what you can read re SEO refers to self-hosted blogs and other sites; wp.com blogs are as SEO friendly as it gets (for a variety of reasons, including built-in sitemaps and standardized URLs). And the truth about duplicate content is that if you are a normal blogger —that is, if you don’t systematically post the same articles in two different sites, you don’t systematically re-publish older posts, you don’t systematically post articles copied from other online sources, and you don’t try to deceive users and search engines— then you needn’t be concerned about this alleged problem.

__________________
1, 2: From Google Webmaster Central: Duplicate content summit at SMX Advanced.
3, 5: From Google Webmaster Central: Duplicate content due to scrapers.
4, 6: From Google Webmaster Central: Deftly dealing with duplicate content.

Recap here: Google Webmaster Central: Demystifying the “duplicate content penalty”.

http://wpbtips.wordpress.com/

About these ads

Discussion

21 thoughts on “On the duplicate content myth

  1. Thanks for this. I’d suspected that was a myth, but never bothered to research it. All my WordPress sites, .org and .com, seem to get about as much Google love as they’ve deserve, and I’ve always avoided or altered themes that only post excerpts to category, tag and author pages. My attitude was: screw SEO, what do the majority of users need?

    Posted by Dave Bonta | March 13, 2011, 19:45
  2. … substantive blocks of material that can be accessed via more than one URLs.

    more than one URL. (singular)

    – – —

    …and search engines—, then you needn’t be concerned about this alleged problem.

    Please: a dash or a comma, but not both!
    En dash or em dash? I’m not sure but I’d probably go with “—” because of dear Emily D—
    though she seems to have developed a unique syntax for that punctuation mark—

    More standard punctuation would be the use of a comma between the if-then statements.

    Posted by Tess | March 13, 2011, 20:24
  3. @Dave Bonta,
    I don’t understand why you would avoid themes which have an enabled excerpt function. Especially with category and tag pages, excerpts allow your readers to see your custom summary of each post; they can decide quickly whether to read or skip a an entire post. If you can summarize a post effectively, then your readers don’t have to read and scroll down and down through a number of whole posts until they find what they are looking for.
    That is just my own opinion, and I admit a fondness for custom excerpts for the reasons stated.

    Posted by Tess | March 13, 2011, 20:35
  4. @Dave: Agree with the screw part. As I said to the commenter who first asked me about this, you write for people, not for search engines.

    @Tess: Hihi, I’m incorrigible:

    http://wpbtips.wordpress.com/2009/05/17/audio-player-additional-options/#comment-174

    Posted by Panos | March 13, 2011, 20:58
  5. LOL. Thanks!

    Posted by Tess | March 13, 2011, 21:11
  6. @Tess – A fair question. I don’t think there’s one right answer; it depends on audience. At qarrtsiluni, for example, where my name here is linked, we use the categories to organize themed “issues,” so they are a major way people browse content. In general, with all my sites, I am not for the most part offering informational posts, which is where I think I would consider your approach.

    Posted by Dave Bonta | March 13, 2011, 21:14
  7. Yes, I am doing a cooking-blog, which is informational in terms of people wanting to find specifics about how to cook various Japanese foods. (though I intersperse personal contents and comments and colors with each post—even when I re-post the same recipes)

    So yes again, perhaps you are right about not wanting excerpts.

    I wonder though, it it would be a nice widget idea to have an excerpt or the beginning lines of a post shown at random? Sort of like a featured post but selected randomly—when a reader refreshed the page, then a new potentially interesting post would appear. ??

    Posted by Tess | March 13, 2011, 22:01
  8. Yeah, that would be cool. We can’t do that at qarrtsiluni since it’s WordPress.com — though I do have the Random link in the top navigation bar — but I should consider it for some of my .org sites. I have been eyeing up the Random Posts from Category widget by Stephanie Leary, as a matter of fact.

    The question for me always is how to have just enough things in the sidebar without making it so cluttered that people tune it out altogether, and what really needs to be there vs. in a page linked from the nav bar, etc. Balancing the needs of regular vs. first-time visitors is another conundrum.

    Posted by Dave Bonta | March 13, 2011, 23:34
  9. “Random Posts from Category widget by Stephanie Leary”
    Is that for wp.org only? Sorry I should just google it…

    oh, sidebar clutter and 1st vs regular and what I myself want to use…

    Posted by Tess | March 14, 2011, 07:04
  10. Yeah, ‘fraid so. http://sillybean.net/code/wordpress/random-posts-from-category/ Maybe if we agitated for it, Automattic would consider adding it here. It looks as if it will do what you want. I’ll give it a try on my main .org site.
    (I’m a huge fan of Leary. Her book Beginning WordPress 3 is the best comprehensive introduction to the platform I’ve found, albeit aimed at an audience that already knows PHP. And she’s written a raft of useful plugins.)

    Posted by Dave Bonta | March 14, 2011, 15:08
  11. There are many cases of duplicate content that I observe frequently. They have a different nature from what you point to in this post which contains information I am well aware of. Yes, I would love to be able to use robot txt to block search engines from various pages on my wordpress.com free hosted blogs as I can do on my wordpress.org install. But let me point out some other types of duplicate content I encounter.

    Example 1: The case I refer to here are those who create a blogspot (Blogger) blog and enter the Google Adsense program. Time passes and they then set up a self-hosted wordpress install and equip it with Adsense. They proceed to import all the content from the blospot blog into the wordpress.org install and thereafter they publish exactly the same content on the blogspot (Blogger) blog and also on the wordpress install. Did these sites suffer penalties? I only know what I witnessed. The bloggers failed to treat duplicate content in the manner Google indicated and I saw their pagerank on the original site drop like a stone. I experienced them posting to other blogging forums and questioning why that was happening to the original blog and their new blog had not earned a pagerank long after being established.

    Example 2: In this case I am pointing to content farms. I also saw those who publish the same content in content farms and then again in their own blogs. Their blogs which existed for more than 6 months did not achieve a pagerank.

    Posted by timethief | March 19, 2011, 20:45
  12. I am well aware that you’re well aware of this information! You’re also well aware that many of my posts spring from questions in the forum and try to serve as answers to those questions. Their target is the average user – in this case the”normal” blogger who may be have been unnecessarily alarmed or misled by strange stories.* Your examples illustrate precisely one of the categories I contrasted with “normal” blogging in the last paragraph of the post: to wit, systematically posting the same content in two sites.** If you do get penalized for that, I’d say: very rightly so!

    *In fact this post is a rehash of a series of replies to a commenter here:

    http://wpbtips.wordpress.com/2009/03/25/full-posts-in-archive-pages/#comment-6640

    **Speaking of which, I would be very interested in your opinion re point 2 in this exchange:

    http://wpbtips.wordpress.com/2009/04/01/codes-useful-for-text-widgets/#comment-8605

    Posted by Panos | March 19, 2011, 23:33
  13. Well before you laugh too much of the scumbags who really really use duplicate content I think you should visit some black seo forums and such. That tactic of having no original content on many many sites actually do work. Autoblogs or autoblogging is a name for them. Scrapers basically. There are loads and loads. I have complained about a couple to Google. They have a form/page for this. Some are based on Yahoo Answers and scripts attaching for example Youtube comments etc. Anything that can be automated via RSS feeds is used. Sites can be quite big but are run by a robot so to speak. High priced scripts/software is the miracle weapon here. Most important content is of course AFFILIATE links to Amazon, Ebay and so on. We are over in malware area, all about the money.

    You can surely find lousy failing attempts but remember much of the work is automated, they have many many sites and also they do not all fail. Far from it. People who “work” with this are often tireless in hunt for easy/automated solutions tricking Google – they will continue to find new ways, over and over. Duplicate content, stealing from sites, could be this one!, is their necessary cover.

    I understand topic here is more of about the usual chitchat that needs to be put to rest but duplicate content is a major problem in other contexts. One of the reasons Google change algorithms over and over. They have big problems dealing with it. Some say Google do not care one bit because ADSENSE is always around fake sites. Google are in bed with douche bag marketers? Hmm, money does tend to rule the world. Well nm, Googles problem might be connected to what has been said here as well. They generally do not mind duplicate content but then again sometimes they should. How to differentiate when decider is an algorithm/bot? Cat and mouse game with this black SEO, as always.

    WordPress.org could help a little by cleaning up some plugins that are targeted this area. I would include obfuscated affiliate link stuff as well. They host some questionable plugins. I see no reason for that.

    Posted by dk70 | March 20, 2011, 06:55
  14. Btw, one of the sites I complained about used to be no. 27 in a Google search. After algorithm change it went to high 30s. Right now it is no. 22. I know these numbers can be a bit elastic depending on where you search from etc. but these fake sites do rank fairly high. In just about any subject area! Big business.

    Posted by dk70 | March 20, 2011, 07:13
  15. Yes, I think you’re right. TT knows more about these things than me, but I think scrapers can achieve a good rank because of the overwhelming amount of posts, and I think they live long and prosper (for the noble reason you mentioned) unless you file a DMCA notice.

    And yes, I’ve seen many of my posts in several scrape sites. Since most of these sites rely on the RSS feeds, at least I’ve made sure I’ve set mine to summaries instead of full posts. Plus all my posts start with an invisible link to my blog, which becomes visible in the feed.

    Posted by Panos | March 20, 2011, 13:43
  16. Is a bit depressing the more topic is investigated and if you are an very active user of Google you also know that what they say on their official blog, or via Matt Cutts, is not always correct. There are tons of “noise” that should not be there or not be so visible. Google do NOT censor stuff but there is front of the bus and there is back of the bus :) They are mighty clever so know all about these blackhat tricks. I think their problem is they cant change much without risking hurting innocent pages. They are too big for their own good perhaps. Trust machinery too much as well. Manual/intelligent intervention could easily clean up index and penalize away but they seem to trust their holy algorithm. When I made that complaint I wrote probably 2 pages – every details of the site. Links and all. Proving there was not 1 single original word, only a bunch of affiliate links. Is an autoblog. The guy at Google reading that, if anyone ever did!, should be able to go OK, hold on to your hats and watch xxxxxxx..xxx drop positions. We don’t like to penalize but we also to not like to award crap just because keywords are yummy. They don’t do this just because of complaints.

    Well your blog is not obvious material for easy to sell products. I think danger here could be someone manually stealing content – for a blog doing much the same as yours. WordPress world is not without “creative” site admins I have noticed that already. Probably a wise approach to be prepared. Like with spammers majority is not so difficult to get rid of. Ignoring, believing humanity is way cool, is never a good idea.

    Anyway, hijacked thread I believe but internal duplicate content is silly to worry about, external one of the reasons many feel internet is “noisy”. That is the polite expression.

    Posted by dk70 | March 20, 2011, 15:17
  17. @dk70
    I hear you and I can confirm everything you are saying. I chose not to post a rant here but believe me when I say I could have. I have also complained and complained and complained. The notion that DMCA complaints must be used is sheer lunacy. For GOB’s sake the scrapers post less than 33 words from single stolen post to escape detection. And don’t even get me started on the word salad routine they use. I published a single simple and very polite post for my readers and left it at that. If I had opened up either in my post or in this one which made my blood pressure soar when I read it I could have written for hours or for at least as long as it took me to bring on a stroke.

    http://onecoolsitebloggingtips.com/2010/12/14/duplicate-content-in-the-serps-sucks/

    Posted by timethief | March 22, 2011, 01:41
  18. Nice site for when I get reading glasses on :) Bookmarked.

    Yes there is much to say about this. Companies policing of their affiliate deals is another topic (good/bad site – either way they collect, like Google with Ads), hidden not transparent affiliate deals goes hand in hand with stolen content and the Google mess. Affiliate deals is a good thing in my mind – but not when basis is bogus. I guess those who really should be annoyed are sites trying hard to make a legit business from their services/content/products. Same with ads. Sites that overdo them make people install Adblockers and then all sites suffer since not that many make use of white list feature, if it exist. Battle starts and methods gets more “creative”.

    Posted by dk70 | March 22, 2011, 22:32
  19. I have seen the effect duplicate content will have on one’s website…

    Not sure of the overall effect on blogs, but it will have a considerable impact on static web sites.

    Google will basically ignore the pages with duplicate content by either not indexing them or deindexing them.

    Unique content is and always should be the driving force for relevancy

    [Username link to commercial website removed - P.]

    Posted by Larry | March 26, 2011, 00:51
  20. Possibly Larry, but I don’t think you can say much referring to X unknown site having this or that experience. Every time Google change their algorithms you can see people complain – which just tells you trusting algorithms 100% is wrong ;) As is also testified by the many many obviously search engine manipulating sites that are available for clicks.

    Posted by dk70 | March 31, 2011, 21:54
  21. @dk70: Maybe L was more interested in planting a link to his site than saying anything of any use or relevance to wp.com blogs (which is why I removed his username link).

    Posted by Panos | April 1, 2011, 11:11

Author

author's avatar panos (justpi)

 Subject Index

Announcement 22/03/2012: After WP's latest move, this blog will no longer offer active support and assistance. The blog will remain online but commenting on older posts has been disabled.
✶ All theme-related posts are updated up to and including theme 189 in this list, but will not continue to be updated.

Stats

  • 2,472,307 views
  • Views per month:
Safari Icon Firefox - Never Internet Explorer
Note: if you see ads on this site, they are placed by WordPress, not me.
wpbtips.wordpress.com
Mostly on themes, formatting, coding, tweaks and workarounds.
Based on or springing from my contributing in the wp.com forum.
Theme-related posts constantly updated
Premium themes and Annotum not included
%d bloggers like this: