All webmasters wish their sites to be crawled by web crawlers as frequent as possible, because this is a great opportunity to boost traffic and rank on the front search results page. By default, all web content is indexed by search engines, like Google, Yahoo, Bing, etc. However, not all pages or posts need to get search volume like the user friendly error page message, confirmation page, thank you page, and so on so forth.
Unfortunately, WordPress only allows you to enable or disable the whole website to be indexed by search engines via Dashboard > Settings > Reading rather than let you limit search engine indexing for certain pages or posts. To help you achieve the goal, we make a guide on how to prevent search engines from indexing a page or post by modifying instructions in robots.txt.
Stop Search Engines by Using robots.txt
Find out your robots.txt file on the server. If you have not had one, you can create a new file and name it as robots.txt. Since our website integrates with cPanel, we are going to target the robots.txt file via cPanel > Files > File Manager. Open the file in an editor.
For instance, if we plan to disallow a post called “BuddyPress Review” to be crawled, then we are required to add the following code to robots.txt and that should include the URL slug “buddypress-review” of this post. To disallow other posts or pages, you just need to copy and paste relevant URL slugs properly.
User-agent: * Disallow: /buddypress-review/
Besides, you are also allowed to prevent search engines from indexing a category. Here, we take image category as an example. And then, you are required to make use of the following code. Note that, the “*” means this instruction applies to all search engines.
User-agent: * Disallow: /images-directory/
However, if you just want to stop certain search engines from indexing the web content, like Google, then you should replace the “*” to “googlebot” as the following code. Keep in mind that bingbot refers to Bing, teoma refers to Ask, googlebot-image refers to Google Images and googlebot-news refers to Google News.
User-agent: googlebot Disallow: /images-directory/
The following robots.txt means the whole website cannot be searched by search engines.
User-agent: * Disallow: /
Use “noindex” Page Meta Tags to Realize the Goal
The use of “noindex” is an understandable method suitable for all people, even for someone lacking knowledge of coding stuffing. You are able to add the following meta tags to any page or post so as to make it unable to be crawled by search engines. Note that, the use of “robots” means all search engines cannot search this page/post.
<meta name="robots" content="noindex">
If you want to disallow all search engines but certain ones, then you are required to follow the meta tags as below. This example indicates that only Google is allowed to crawl the page/post.
<meta name="googlebot" content="noindex">
Besides, you can also specify more than one search engines out of blocking state by using the meta tags as below. This example means both Google and Bing are able to crawl the page/post.
<meta name="googlebot,bingbot" content="noindex">
Block Search Engines with WordPress Plugin
WordPress has developed a large number of plugins for blocking search engines, like WordPress SEO by Yoast, WordPress Meta Robots, PC Hide Pages, and so on, among which WordPress SEO by Yoast is the most popular one with a bundle of advanced features.
If you are new to this field and know a little about robots.txt and meta tags, then the use of a robots meta plugin is a great option for you.
Plugin URL: https://wordpress.org/plugins/wordpress-seo/