Guide to file Robots.txt: what it is and why it is so important
Technical SEO
  2 min read
Guide to file Robots.txt:…
SEO Tester Online
13 December 2019

Guide to file Robots.txt: what it is and why it is so important

In this article, we will explore the role of robots.txt, a small file that can make the difference between scoring a high ranking and languishing on the lowest deeps of the SERP.

What is robots.txt

Robots.txt’s role is to tell the crawler which pages it can require from your site. Beware, the spider can still see them. It just does not scan them. If you want to hide a page, you should rely on noindex instructions, as specified by Google’s Search Console Guide.

So, why do you need a robots.txt file? Because you can make crawling faster and smoother, saving your server from too many crawler requests. You can exclude duplicate or not essential pages from the scanning that can hurt your ranking.

Where to put robots.txt

You have to put the robots.txt file inside your website’s main directory so that its URL is http://www.mywebsite.com/robots.txt.

Do not put it elsewhere, or it won’t work.

How to create robots.txt file

Create a .txt file in your website’s main directory and call it “robots”. Remember that you can have only one robots.txt file per site.

Create a group

Create your first group. A robots.txt file can have one or more groups. 

Each group has one or more instructions (also called rules). Remember to use only one instruction for each line.

Robots.txt istructions

Instructions can be of three types:

  • user-agent: the crawler to which the rule applies. 
  • allow: all the files or directories to which the user-agent can access.
  • disallow: all the files or directories to which the user-agent cannot access.

A rule must include one or more (or all!) user agents, and at least an allow or disallow instruction (or both).

Robots.txt examples

For example, to prevent Googlebot to scan your entire website, you must write in your robots.txt file something like:

#Prevent GoogleBot from scanning. (this is a comment. You can write what you want)

User-agent: googlebot
Disallow: /

If you want instead to exclude more than a directory for all crawlers:

User-agent: * 
Disallow: /directory 1
Disallow:/ directory 2

(the asterisk means “all”)
Or maybe exclude all directories but one to a specific crawler:

User-agent: specific-crawler
Allow: /directory1 User-agent: * Allow: /

In this way, you’re stating that every other crawler can access to the entire website.

Finally, we can prevent the scanning of a specific file format, for example, jpg images.

User-agent: *
Disallow: /*.jpg$

The $ character establish a rule valid for all strings that end with .jpg.

To see more examples, visit the Google’s Search Console guide.

Learn more about Technical SEO

Technical SEO is not easy. But it’s fundamental to make SEO the right way.

Learn it reading our Guide for Beginners to Technical SEO.

Leave a Reply

Your email address will not be published. Required fields are marked *