WordPress itself is SEO-optimized. However, when using WordPress to host a blog, various issues may still happen and prevent your blog from being ranked well in search engines like Google. And among these issues, a severe and commonly seen one is the duplicate content issue.
Mistakenly or unintentionally, you may have the same content that is accessible through two or more different URLs. For example, if your blog is not optimized well, readers might be able to access the same post with any of the following URLs.
Since all these links actually lead to the same content, they don’t affect your user experience. However, Google does consider them as different things so it gets confused about which link to show when the related keyword is searched. Finally, Google will pick one link and hide the rest from the search results, and then it starts to penalize your site for duplicate content.
There are various possible reasons that could lead to the duplication problem. In below we are going to discuss the common causes of the WordPress duplicate content issue and show the solutions to them.
Cause 1: Your Authors Have Created Duplicate Content
This is not a technique-related cause but it indeed exists on some blogs especially those with heavy content. Sometimes you, or your authors, may include the same code snippet, quotes or sentences in different posts just to make things easier to understand. This is understandable when you write on similar or related topics. However, those contents will be detected by Google and cause a duplicate content issue. The same case happens when some other sites accidently use the same quotes and code snippets as you.
Solution: There are several ways to find out the duplicate content inside your blog. For example, by using Google Search Console (formerly Google Webmaster Tools), you can get information about the duplicate title tags and meta descriptions by clicking Search Appearance > HTML Improvements.
For the duplicate post content, you will need to use a duplicate content checker like CopyScape. By installing the CopyScape plugin on your WordPress site, you are able to check posts and pages for plagiarism at any time by clicking on a button only.
If any content is found with a duplicate issue, the matches will be displayed, and the duplicate content will be highlighted for easy recognition. Then you can modify the content before publishing the post/page to prevent the possible bad SEO effects.
If you have added a large amount of content to your WordPress site, we suggest you check each existing post with CopyScape when you use this tool for the first time. This will help resolve most of the plagiarism issues.
Cause 2: Your Content Has Been Stolen by Scrapers
Although most bloggers are devoted to creating valuable and original content to gain an audience, some bad guys simply build a scraper site and steal content from other blogs without any permission. When your posts are stolen by scrapers, they might lose the rankings they should have received because of the duplicate content on the Internet.
The duplication seems not to be your fault, but unfortunately, you will still suffer a loss of both search engine rankings and valuable traffic. Besides, the more popular your blog is, the more scrapers you will come across.
Find the scraper sites stealing your content
To see whether your content has been copied by scrapers or anyone else, you can check your posts and pages with CopyScape. This tool will detect the matches to your content, and once you find a website that steals your content, note it down.
What to do with the scraper sites?
After locating the copied content on other sites, you can ask Google to remove the links from the web index by filing a DMCA complaint. To do so, you need to complete some steps.
Firstly, you need to visit the DMCA page on Google, and on this page, choose to submit a legal request.
Open the DMCA tool by following the link. On the next page, select “Web Search” as the product.
After finishing several other options required by Google and confirming that you are the copyright owner, you will get a DMCA notice form where you can report the links with copied content. There, you need to enter the URL of the original post on your site, and then provide the URLs of all the infringing materials. Multiple URLs can be reported at one time.
After submitting the DMCA notice, you may need to wait 2-10 days for the copied content to be removed from Google search results. With successful removal, you will be able to get the stolen traffic and rankings back soon.
Cause 3: Some Unwanted Content Has Been Indexed Accidentally
You may have never imagined that small misconfigurations on your blog can produce a lot of duplicate content for search engines to identify. Just as the examples shown in the beginning of this post indicate, the taxonomies that shouldn’t be indexed, such as tags and categories, could lead to severe SEO problems within your blog.
To see whether Google has indexed taxonomies and other pages that cause duplication, you can open Google and search your site with “site:www.yourdomain.com”. All the URLs that have been indexed will be displayed, so you can browse the links and find out the ones that should be excluded from the search results.
Below are some of the common parts that you need to stop Google bots from indexing. If you have found any of them when searching your site on Google, take measures now to fix the problem.
Categories and tags
By default, your category and tag pages are indexed. However, if they are set up to include complete posts, they can cause a serious duplicate content issue because Google will find the same content in multiple locations especially when you have assigned more than one category to a single post.
In the case that there is any category or tag page listed in Google search results with post content displayed, we will suggest you change some SEO configurations by using a WordPress SEO plugin.
For example, if you are using Yoast SEO, you need to go to SEO > Titles & Metas, open the “Taxonomies” tab, and then no-index your categories and tags. By doing so, those taxonomies will disappear from your search results.
Author archives and other archives
There are two archives built with WordPress: author archives and date-based archives. Also, you can create custom archive pages at any time.
The archive feature is useful for delivering a good user experience, but search engines will consider those archive pages as different locations of your post content. Depending on the theme settings, archive pages may include excerpts only instead of complete posts, while duplicate content issue could still arise.
To prevent this issue from happening, you can no-index your archives in the archive settings in Yoast SEO or other SEO plugins.
Image attachment pages
When images are added to your posts, they come with their own attachment URL. Clicking on this URL, you will be led to the attachment page instead of the parent post. These attachment pages could be indexed by Google.
The indexing of such pages could cause the following problems.
- The useless pages mess up your search results.
- If not dealt with carefully, the attachment pages can produce duplicate content from the parent post.
To solve this problem, you have to disable the attachment pages by going to SEO > Advanced > Permalinks and redirecting attachment URLs to the parent post (assumed that you are using Yoast SEO).
Special Tip: Use Canonical URLs to Stop Duplicate Content Issue
Besides categories, tags and archives, there are still many other things that can generate different URLs targeting the same content, for example, the comment pagination. To prevent most of the “other” problems, you can set up canonical URLs to tell search engine bots which link to index. Most SEO plugins like Yoast SEO add canonical URLs to posts automatically.
Canonical URLs can help you eliminate the WordPress duplicate content issue generated by all of the following causes.
- Different comment pages (“/comment-page1/”, “/comment-page2/”, etc) for a single post when comment pagination is enabled.
- URLs with and without “www” that both lead to the same content.
- URL parameters used for tracking, sorting or any other purpose. URLs with these parameters seem different for search engines, but they actually include the same content.
- Printer friendly versions of your pages. If these pages are created on your WordPress site and not blocked from search engines, they cause duplication.