Google and other search engines follow the Robots Exclusion Protocol, more commonly known as robots.txt. This protocol allows a webmaster to prevent search engine spiders (and other types of robots) from accessing particular web pages.
But what if you want to prevent search engines from indexing part of a page? You might want to do this if your page has ads or other text that isn’t really pertinent to the subject of the page. As an example, here’s a Google search snippet with part of Wikipedia’s annual fund raising message.
That’s not very good for users, and it’s not good for webmasters. Fortunately, there is an easy way to prevent this type of situation.
How to Block Part of a Page
First, you will need to understand how to block an entire page from being indexed. There are two methods:
1. Use the robots.txt file. Add code like this, replacing “somepage” with the actual name of your page:
2. Use the robots meta tag. Add this tag to the section of the page you want to block:
Now, to get Google to exclude part of a page, you will need to place that content in a separate file, such as excluded.html, and use an iframe to display that content in the host page.
The iframe tag grabs content from another file and inserts it into the host page. Finally, use either method above to block search engines from indexing the file excluded.html.
Methods That Don’t Work Reliably
In the past webmasters have used JavaScript or Flash to hide content from search engines. As of 2014, this no longer works for Google, because they have the capability to spider and index content generated by JavaScript or Flash. If your JavaScript or Flash content is not relevant to the topic of your page, you will want to use a more reliable method to keep it from being indexed.
Have questions or want more information? Please contact us today.