What is a Robots.txt file? And why is it important for SEO?

Robots.txt is a text file with instructions for search engine robots that tells them which pages they should and shouldn’t crawl.

According to Google Search Central “A robots.txt file tells search engine crawlers which URLs the crawler can access on your site. This is used mainly to avoid overloading your site with requests”

Simply, if you specify in your Robots.txt file that you don’t want the search engines to be able to access a certain page, that page won’t be able to show up in the search results and web users won’t be able to find it. Keeping the search engines from accessing certain pages on your site is essential for both the privacy of your site and for your SEO.

So, why are robots.txt files important for Search Engine Optimisation (SEO)?

Robots.txt is important to SEO because it is the first place Googlebot will go once it reaches your site, and it lets search engines know what they can and cannot crawl on your site.

Utilising a robots.txt file correctly can help manage the limited crawl budget that Google has, and make sure all of your most important pages are indexed and discoverable on search engine results pages.

How to find your robots.txt file?

Finding and viewing your sites robots.txt file isn’t a difficult technical SEO task; it is actually quite simple.

Simply, type the URL of your website and end it with /robots.txt — e.g. www.exampleURL.com.au/robotics.txt

How to create a robots.txt file?

When creating a robots.txt file, you need to specify which user-agents are permitted to crawl your site. A user-agent in the robots.txt file is the name of the software that is used to access your site, such as Googlebot or Bingbot. You can allow all user-agents to crawl your site by using an asterisk (*), or you can be more specific and only allow certain user-agents. For instance, you could potentially have your robots.txt disallow all except Googlebot or Bingbot.

Secondly, you will need to specify which directories or files you want to block. This is done by using the “Disallow” command, followed by the path of the directory or file you want to block. You can block multiple items by using numerous Disallow lines.

Lastly, you can indicate how often you want a crawler to visit a particular URL by using the “Crawl-delay” command. This is valuable if you have a large website or know that your site is frequently updated. By identifying a crawl delay, you can guarantee that the crawlers wont overload your server and that they have enough time to index any fresh content.

What happens if you don’t use a robots.txt file?

If you do not use a robots.txt file, search engines will be able to index all of the pages on your website, including the pages you do not want them to. What’s more is that this can lead to your website being penalised by search engines, and it may also cause visitors looking for information on your website to be unable to find it.

At WebEagles we understand the importance of technical SEO, and all that comes with it. If you have any questions regarding implementing robots.txt to your website, give us a call on 1300 123 808, and one of our friendly SEO experts will be able to assist.