An XML (Extensible Markup Language) Sitemap is a text file used to detail all URLs on a website. It can include extra information (metadata) on each URL, with details of when they were last updated, how important they are and whether there are any other versions of the URL created in other languages.
All of this is done to help the search engines crawl your website more efficiently, allowing any changes to be fed to them directly, including when a new page is added or an old one removed.
There is no guarantee that an XML Sitemap will get your pages crawled and indexed by search engines, but having one certainly increases your chances, particularly if your navigation or general internal linking strategy doesn’t link to all of your pages.
Sitemaps come in four primary categories:
Normal XML Sitemap: The most popular kind of sitemap is the standard XML sitemap. Typically, it takes the shape of an XML Sitemap with connections to various pages on your website.
Video Sitemap: Use a video sitemap to explain the video content on your page to Google.
News Sitemap: Google is assisted by the news sitemap in finding content on websites that have been authorized for Google News.
Image Sitemap: Aids Google in locating all of the images stored on your website.
Sitemaps: Why Are They Important?
Your sitemap helps search engines like Google, Yahoo, and Bing find the various pages on your website.
But it won’t interfere with your SEO efforts in any way. So using them makes sense. Additionally, there are a few unique situations when a sitemap is really helpful.
For instance, Google mostly uses links to find websites. A sitemap is also quite important for assisting Google in finding pages on your website if it is a fresh site with few external hyperlinks.
Here’s how to build up a sitemap and make it search engine friendly.
Sitemap creation is a best practise.
Make a sitemap as your initial step.
You can make a sitemap using our XML Sitemaps generator. You can use the XML file that these produce as your sitemap. In either case, after your sitemap is built carefully reviewing it.
Typically, we can find the sitemap at https://www.example.com/sitemap.xml. It should show every page on your website.
Next, we should submit your sitemap to Google if everything seems good.
Sitemap submission to Google
Log in to your Google Search Console account to submit your sitemap.
Visit, go to “Index” → “Sitemaps” in the sidebar.
You’ll see a list of “Submitted Sitemaps” on this page if you’ve already submitted your sitemap.
If everything is configured properly, you will begin to see information about your sitemap on this page in the “Submitted Sitemaps” area.
Glossary of Tags Used in a Sitemap
<urlset> - The Sitemap opens and closes with this tag. It is the current protocol standard.
<url> - This is the parent tag for each URL entry.
<loc> - This tag contains the absolute URL, or the locator of the page.
<lastmod> - This contains information about the file’s last modified date. It should be in YYYY-MM-DD format.
<changefreq> - This contains information about the frequency with which a file is changed.
<priority> - This indicates the file’s importance within the site. The value ranges from 0.0 to 1.0.
<xhtml:link> - In this case, this tag is used to provide details of alternate URLs offered in other languages.
NOTE:
- The loc tag is compulsory, while the lastmod, changefreq and priority tags are optional.
- Ideally, an XML Sitemap should be added to the root directory of the website. All URLs in the Sitemap must come from the same host.
- Only the canonical version of all page URLs should be included, so pages should not redirect or return an error status.
- The maximum length of the URLs is 2,048 characters.
- While it may seem possible to manipulate search engines into thinking the content on your page is frequently updated by declaring the changefreq tag daily, it is not advisable to do so. If the frequency and priority tags do not reflect reality, chances are that search engine crawlers will ignore them.
- All URLs in the Sitemap must come from the same host.
- If you need help building your sitemap, there are several sitemap generator tools to help.
How to Find Errors Using the Sitemap Report
Click on it under “Submitted Sitemaps” once Google has crawled your sitemap. Google successfully crawled your sitemap if you notice “Sitemap index processed successfully” in the search results.
You may access the Coverage Report for your sitemap by clicking on the tiny bar chart icon on the right side.
This report reveals the number of URLs that Google discovered in your sitemap and the proportion of those pages that made it into Google’s index.
You may see, for instance, that my sitemap links to 100 URL’s. 42 are “Not Indexed” and 58 are “valid.”
The valid pages can obviously be disregarded since every thing seems to be good. But I definitely want to examine what’s going on with any “Not indexed” pages.
Before that we need to know what are the errors we may face during this report.
4 categories are distinguished by the Index Coverage report:
- Valid: Pages that have been indexed are valid.
- Valid with warnings: Pages that have been indexed but have some problems you might want to check into are valid with warnings.
- Excluded: Pages that were not indexed because search engines recognised obvious indications indicating they shouldn’t be indexed were excluded.
- Error: pages that for some reason couldn’t be indexed.
There are one or more types for each status. Each type will be defined below, along with whether any action is necessary and, if so, what should be done.
Valid Pages
The “Valid” status only contains two types:
- Submitted and indexed
- Indexed, not submitted in sitemap
Valid URLs with warnings
The “Valid with warnings” status only contains two types:
- Indexed, though blocked by robots.txt
- Indexed without content
Excluded URLs
The “Excluded” status contains the following types:
- Alternate page with proper canonical tag
- Blocked by page removal tool
- Blocked by robots.txt
- Blocked due to access forbidden (403)
- Blocked due to other 4xx issue
- Blocked due to unauthorized request (401)
- Crawled – currently not indexed
- Discovered – currently not indexed
- Duplicate without user-selected canonical
- Duplicate, Google chose different canonical than user
- Duplicate, submitted URL not selected as canonical
- Excluded by ‘noindex’ tag
- Not found (404)
- Page removed because of legal complaint
- Page with redirect
- Soft 404
Error URLs
The “Error” status contains the following types:
- Redirect error
- Server error (5xx)
- Submitted URL blocked by robots.txt
- Submitted URL blocked due to other 4xx issue
- Submitted URL has crawl issue
- Submitted URL marked ‘noindex’
- Submitted URL not found (404)
- Submitted URL seems to be a Soft 404
- Submitted URL returned 403
- Submitted URL returns unauthorized request (401)
Use your sitemap to look for indexing issues
A cool feature of using a sitemap is that it can provide you with an approximate estimate of:
- How many pages DO YOU WANT TO BE LISTED?
- What number of pages ARE index
Let’s take the scenario where your sitemap has connections to 3000 pages. However, the Google Search Console shows that there are just 1,000 pages on your website that are indexed.
Match Your Robots.txt and Sitemaps
Your sitemaps and Robots.txt must cooperate effectively.
You are not giving Google conflicting signals. So you DO NOT want a page to appear in your sitemap if you have blocked it in Robots.txt or have used the “noindex” tag on the page.
It’s also possible that you have more pages on your website than your crawl budget allows. That indicates that something is wrong. It’s possible that those 3,000 pages include a significant amount of duplicate content. Google does not index all of them.
Conclusion
Google offers instructions on how to create sitemaps and submit them to the search engine under the heading “Build and submit a sitemap.” Use sitemaps to make your website’s content easier to find on Google. How sitemaps can make it such that your website shows up higher and more frequently in search results.
Leave a Reply Cancel reply