We’ve all seen the movies where the robots are coming for us, including classics such as The Terminator and The Matrix. What people may not know is that they already came for us – and got all of our information… In fact, the most widely used website in the world is built upon one of these robots: The GoogleBot.
Indeed, not all robots are here to subjugate humanity and turn us into subservient slaves. They are actually quite helpful, indexing web sites on popular search engines so that visitors may come and indulge in the pages of our websites. Without these robots, most of the information revolution we’ve seen in the past 20 years would not have been possible.
But what if you have some information on your website you’d rather not have the whole world take a look at? Perhaps a baby picture from when you were small that you only share with family friends, internal pages that you may want to keep out of the search results page from a business perspective. There are many valid reasons for “banning” the bots from certain pages, and there are some good ways to do this.
One answer is a robots.txt file. Essentially this is a text file (which can be written in any text editor) that issues commands to robots to visit only the portions of a website that you allow. The basic syntax is fairly simple, and a good overview is available here. We want to be very careful when employing these files, however, and make absolutely sure that we know what effects our actions will have. For this reasons, many webmasters are uncomfortable with editing this themselves, as one small mistake could render your site entirely invisible (or entirely visible) to any robot.
Luckily, Google now offers a tool that will automatically generate a robots.txt file for you, saving some time and perhaps avoiding an unintentional disaster.
Using this tool can help you control the pages of your website, and we can make sure our robots keep coming back on our terms, without terminating us.