Technical SEO
  4 min read
What is XML Sitemap:…
SEO Tester Online
24 January 2020

What is XML Sitemap: how to generate it and send it to Google

In this article we’ll study the definition of XML Sitemap and why it’s so important for your website.

We’ll also see how to create it and send it to Google with the help of Google Search Console.

Let’s start!

What is an XML Sitemap?

An XML Sitemap is a specific file where you can list all the pages of your website. In other words, it allows you to illustrate neatly all URLs of your site.

It also allows Google’s crawler to control in an easy way your website, to scan it and index your content.

Is it important to have an XML Sitemap?

Having an XML Sitemap helps Google to figure out how your site is structured, allowing it to reduce the time needed to index your content.

According to Google, you should have a sitemap especially in these cases:

  • When you have or manage a very large and complex website, composed of more than 500 pages. This, because web crawler sometimes neglects some new content on your website.
  • When you have a new site that doesn’t have an external link.
  • When your website has a news section or a lot of images or videos.

If you’re not in any of those cases, don’t be afraid. Creating an XML Sitemap is always a good choice since it permits you to structure your website and make it tidier, even if it has a few pages. In the long term, it will help you.

How to create an XML Sitemap

Creating an XML Sitemap is a very simple process.

If you normally use WordPress, there are a lot of plugins that will do this job for you automatically.

The first tool to create a sitemap that we recommend to you is Yoast SEO, a free and intuitive tool. What do you have to do is just download it by clicking here

Yoast SEO Logo

Then, you have to activate advanced settings and in the end, turn on the function XML Sitemaps.

Another plugin that we highly recommend is Google XML Sitemap. This one allows you to generate automatically your sitemap and moreover to report immediately to the search engine all your new contents.

If you don’t have a WordPress website, no panic. You could use a tool like XML-Sitemaps, an online platform that helps you to create a sitemap for free.

xml Sitemaps Logo

Now that you possess your sitemap, you need to send it to Google. Let’s see how.

How to send an XML Sitemap to Google

To complete this action you must use Google Search Console, a free tool that any SEO Specialist should know and master.

Thanks to Search Console, you can send your XML Sitemap in 3 easy steps:

  1. Login in Google Search Console and select your website
  2. Click on “Sitemap” (on the navigation bar on the left)
  3. Digit the URL of the sitemap in the appropriate field

Now that we understood what are the right steps to follow to transmit our sitemap, it’s extremely useful to know what are the different types of sitemaps.

Types of Sitemaps

There are different types of sitemap depending on what you need and who you turn to. Let’s see them!

XML Sitemap

Sitemap xml example

This one is the most popular today and is created to help crawlers such as GoogleBot to index your content easily. This file allows them to “see” all the categories of your web page, as we said before. However, the Sitemap XML has some limits that must be respected.In fact, no more than 50.000 URLs can be entered and the uncompressed file must not weigh more than 50MB. We can find the list of URLs and other information that may be useful to you as:

  • Date of the last update of the page.
  • Frequency of update.
  • The priority of the URL over other pages of the site. 

HTML Sitemap

It’s an example of a sitemap designed exclusively for users.

It contains all the URLs of the website and it can facilitate navigation for the user by mapping the structure of our site effortlessly. Unlike the XML format, the HTML sitemap is clearly readable to a normal user and has no purpose regarding the indexing of your web page.

News Sitemap

A type that can be appropriate if you have a section, on your site dedicated exclusively to news.
By sending it you can improve your ranking in the Google News section.
If you are wondering how to create a specific sitemap for Google News we suggest you to consult the Search Console’s Guide to this link.

Image Sitemaps

They are sitemaps designed, as you can imagine, for images and their related content.

This will be useful to position yourself in the Google Images search section and to add helpful information such:

  • Location
  • Caption
  • Title
  • URL

Remember that you can list up to 1000 images per page.

Video Sitemap

Suitable to allow your potential users to find you on Google’s Video section. It is important in this case to specify information such as:

  • Videos category.
  • Its duration.
  • Its title.
  • The URL.

Have you uploaded the Sitemap correctly?

Now you know how to create a Sitemap, which are the different types available and how to send it to Google.

But… have you checked if you uploaded it correctly?

You can do so for free with our SEO Checker:

  1. Insert your website’s URL.
  2. Go to the “Base” card.
  3. Check if you uploaded correctly the XML Sitemap at the voice “Sitemap”.

Verify if the XML Sitemap is correctly installed on your website.

Guide to file Robots.txt: what it is and why it is so important

In this article, we will explore the role of robots.txt, a small file that can make the difference between scoring a high ranking and languishing on the lowest deeps of the SERP.

What is robots.txt

Robots.txt’s role is to tell the crawler which pages it can require from your site. Beware, the spider can still see them. It just does not scan them. If you want to hide a page, you should rely on noindex instructions, as specified by Google’s Search Console Guide.

So, why do you need a robots.txt file? Because you can make crawling faster and smoother, saving your server from too many crawler requests. You can exclude duplicate or not essential pages from the scanning that can hurt your ranking.

Where to put robots.txt

You have to put the robots.txt file inside your website’s main directory so that its URL is http://www.mywebsite.com/robots.txt.

Do not put it elsewhere, or it won’t work.

How to create robots.txt file

Create a .txt file in your website’s main directory and call it “robots”. Remember that you can have only one robots.txt file per site.

Create a group

Create your first group. A robots.txt file can have one or more groups. 

Each group has one or more instructions (also called rules). Remember to use only one instruction for each line.

Robots.txt istructions

Instructions can be of three types:

  • user-agent: the crawler to which the rule applies. 
  • allow: all the files or directories to which the user-agent can access.
  • disallow: all the files or directories to which the user-agent cannot access.

A rule must include one or more (or all!) user agents, and at least an allow or disallow instruction (or both).

Robots.txt examples

For example, to prevent Googlebot to scan your entire website, you must write in your robots.txt file something like:

#Prevent GoogleBot from scanning. (this is a comment. You can write what you want)

User-agent: googlebot
Disallow: /

If you want instead to exclude more than a directory for all crawlers:

User-agent: * 
Disallow: /directory 1
Disallow:/ directory 2

(the asterisk means “all”)
Or maybe exclude all directories but one to a specific crawler:

User-agent: specific-crawler
Allow: /directory1 User-agent: * Allow: /

In this way, you’re stating that every other crawler can access to the entire website.

Finally, we can prevent the scanning of a specific file format, for example, jpg images.

User-agent: *
Disallow: /*.jpg$

The $ character establish a rule valid for all strings that end with .jpg.

To see more examples, visit the Google’s Search Console guide.

Learn more about Technical SEO

Technical SEO is not easy. But it’s fundamental to make SEO the right way.

Learn it reading our Guide for Beginners to Technical SEO.

How to create SEO-friendly URLs

You probably know that the URL (Unique Resource Locator) is the home address of your site. It’s the one we type on our browser address bar when we want to access a website.

A thing you may not know is that the URLs affect SEO, too.

In this article, you will find out how to optimize URLs for SEO and turn them into allies for your ranking.

The importance of an SEO-Friendly URL

An SEO-Friendly URL is a web address that helps the user to remember the address and understand its logic. The latter is essential also for the crawler. 

A proper SEO URL helps the crawler to understand what the user can find on that page. In this way, it can make the page available for the right query.

The structure of an SEO URL

A right SEO URL must be descriptive of the path of the page and the site structure. Besides, it must tell the user (and the spider) what it is going to find there.

Something like:

http:///www.mywebsite.com/category/keyword.html

An example of a properly optimized URL

Take the URL to our SEO Checker:

https://www.seotesteronline.com/seo-checker/

See? Pretty straightforward: there are our name and the tool you’re going to find.

Another example comes from our knowledge base.

Our subdomain (help) tells you that there you can find help. Where? On seotesteronline.com

What follows tells you that you are inside the knowledge base, on the section dedicated to the keyword explorer

And what are you going to find there? You can get from the last bit! 

What is the slug and how to make it SEO-friendly

The last fragment of the URL (what-is-keyword-explorer) is called slug. It is the element that describes the page. Our advice is to use, for the slug, the main keyword you chose for that page.

Check your SEO-friendly URL

To summarize, here is a handy checklist to create perfect SEO-friendly URLs:

  • Make them descriptive of the content and the site structure;
  • Use pertinent keywords, especially on the slug;
  • Use subdomains responsibly: the crawler can consider subdomains as separate websites, watering down your SEO efforts;

URLs must be easy to read for both users and crawlers. So keep them short, without too many subfolders or special characters (like the ones in dynamic URLs).

Guide to HTTP Status Codes

HTTP status codes are standard messages that occur between the client and the server when they communicate through the HTTP (that stands for hypertext transfer protocol). 

For example, when you (the client) click on a link or type a URL on the address bar, you are sending a request to the server. You’re asking it to let you view a webpage. The “language” of this request is the HTTP. The response you get from the server is a three-figures code.

In this article, we’ll sort the most common HTTP status codes.

The codes consist of five classes:

  • informational responses; 
  • successful responses;
  • redirect information;
  • client error responses; 
  • server error responses.

100 HTTP status codes (informational responses)

These codes tell us that the server received the request, and it is processing it. The answer can be:

  • 100 (continue): the server received the request header (the information about the request). The client can go on sending the request body (the actual data payload);
  • 101 (switching protocols): the server received the client’s request to switch protocol.
  • 102 (processing): the server got the request, but it cannot respond yet. It’s a response given to keep the connection from going timeout.

200 HTTP status codes (successful responses)

Successful responses begin with number two. They mean that the server received the request and has accepted it:

  • 200 (OK): it is the generic success response;
  • 201 (Created): The server created the requested resource ;
  • 202 (Accepted): The server accepted the request, but it is still working to return the response.
  • 203 (Non-Authoritative Information): same as 200. A transforming proxy has received the request. On its turn, it returns a modified response.
  • 204 (No Content): The server has successfully processed the request and returned no content.
  • 205 (Reset Content): Same as 204. Also, the response requires the reset of the document view by the requester.
  • 206 (Partial Content): The server is returning only a part of the resource requested by the client. The range header causes this. The client sends the range header. Its role is to fraction a large download into many smaller, simultaneous ones.
  • 207 (Multi-Status): this is a response to multiple requests. It indicates that the body includes the former response codes.
  • 208 (Already reported): Used in DAV responses. It suggests that a previous 207 code reports the responses.

300 HTTP Status codes (Redirections)

These status codes indicate that the server needs to take additional action to complete the request, such as redirection. 

Redirection is essential in SEO. We can set this function to tell the browser that the resource is not in the original URL anymore. It is useful when we delete a page, and we want to redirect the user to a 404 page.

Another reason could be the change of the URL structure or the domain.

  • 300 (Multiple Choices): the client has multiple options, for example, regarding the format in which download the requested resource.
  • 301 (Moved Permanently): the client must direct the request (and all future requests) to another URI. 
  • 302 (Found): can be used to indicate a temporary redirection, or to tell the client to look for another URL.
  • 303 (See Other): the client can find the requested resource at another URI.
  • 304 (Not Modified): the client already possesses the requested resource. The server cannot provide a more updated version.
  • 305 (Use Proxy): The requested resource is available through a proxy.
  • 307 (Temporary Redirect): Same as 302.
  • 308 (Permanent Redirect): Same as 301.

400 HTTP Status codes (Client errors)

This code means an error that concerns the client. 

It can be a bad request, a not found requested resource or a lack of privileges to access it. 

The response must include in its body an explanation of the error and whether it is temporary or permanent. 

  • 400 (Bad Request): the server is not able to process the request. It could be because of an error in syntax, a too-large request size, or an invalid request.
  • 401 (Unauthorized): The client cannot access the resource it requested. It failed to authenticate, or there is no authentication possible.
  • 403 (Forbidden): Unlike 401, the client authenticated itself, but the server refuses to process it, nonetheless. 
  • 404 (Not found): the most famous response code. The server has not found the requested resource. However, it could be available in the future.
  • 405 (Method Not Allowed): the client sent the request using an invalid method. For example, when you use a GET when you should have used a POST. Want to learn more about HTTP methods?
  • 406 (Not Acceptable): the server can generate the resource requested, but its format is not among those accepted in the request header.
  • 407 (Proxy authentication): The client must authenticate itself with the proxy.
  • 408 (Request timeout): The server timed out while waiting for the request.
  • 409 (Conflict): the request cannot be processed because there is a conflict ongoing between different versions of the same resource. It can happen when the server receives more than one edit request for the same resource at the same time.
  • 410 (Gone): The requested resource is not available. Unlike 404, the resource in question won’t be available again.
  • 411 (Length required): The request does not contain an indication of the length of its content as required by the resource.
  • 412 (Precondition Failed): The server cannot process the requests. The reason is that it does not possess one of the preconditions specified in the request.
  • 413 (Request Entity Too Large): The request is too large for the server, and it cannot manage it. 
  • 414 (URI Too Long): The URI contained in the request is too large to be processed by the server.
  • 415 (Unsupported Media Type): The client requested a resource in a format the server doesn’t support.
  • 416 (Range Not Satisfiable): The client requested a file fragment, but the server cannot satisfy the request.
  • 417 (Expectation Failed): The server cannot comply with the requirements contained in the request header.
  • 418 (I’m a teapot): It’s a joke-code by the IETF
  • 420 (Enhance your calm): Used by the Twitter APIs to indicate that the client made too many requests in a short time.
  • 421 (Misdirected request): the server that received the request cannot process it.
  • 422 (Unprocessable Entity): A semantic error prevents the server from processing the request.
  • 423 (Locked): The client cannot access the resource because it is locked.
  • 426 (Upgrade required): The client should use a better security protocol.
  • 429 (Too many requests): The client sent too many requests too fast. 
  • 451 (Unavailable for Legal Reasons): The client requested a resource whose access is limited by censorship or government request. It could be a reference to Ray Bradbury’s Fahrenheit 451.

500 HTTP Status codes

500 status codes indicate server errors. It means that the client request is correct, but the server cannot perform it.

  • 500 (Internal Server Error): generic error message, unexpected error.
  • 501 (Not Implemented): the server cannot recognize the request method or is unable to perform the request at the moment.
  • 502 (Bad Gateway): the server is acting as a proxy. The upstream server sent an invalid response.
  • 503 (Server Unavailable): the server cannot fulfill the request at the moment because it is down. For example, because it is undergoing maintenance. 

Which codes are essential for SEO

Some codes are more important than others in SEO because they can influence the ranking.

We can sort them into three groups.

Found

It includes only the code 200. It indicates that the page has been found at the indicated URL, as expected. 

Redirects

It includes the codes 301,302, 303. They indicate that the resource is elsewhere, either temporarily or permanently. Their correct implementation is critical to not incur into penalizations.

Not found

A 404 HTTP status code causes a not smooth crawling, and then the ranking of the whole website can suffer.

Find out which codes your website returns

To discover what codes the pages within a website returns, scan it with the SEO Spider

SEO Spider screen

Just enter the URL you want to analyze. Then navigate the menu on the left. Under Structure, select Status. You can also filter them by the groups mentioned above.