Make things simpler for Googlebot, not more difficult. Additionally, you want Google to concentrate on your most crucial pages rather of spinning through numerous unnecessary urls.
What you can discover in the index coverage report that might make your site bigger from a crawling and processing view.
You can be making Google crawl and process a lot more sites than it needs to for a variety of reasons. Additionally, a number of complaints in this category highlight urls that are supplied in xml sitemaps but don’t successfully resolve.
Investigate carefully to learn what is happening.
4 categories are distinguished by the Index Coverage report:
- Valid: Pages that have been indexed are valid.
- Valid with warnings: Pages that have been indexed but have some problems you might want to check into are valid with warnings.
- Excluded: Pages that were not indexed because search engines recognized obvious indications indicating they shouldn’t be indexed were excluded.
- Error: pages that for some reason couldn’t be indexed.
There are one or more types for each status. Each type will be defined below, along with whether any action is necessary and, if so, what should be done.
Valid URLs
Pages that have been indexed are “valid URLs”.
The “Valid” status is applicable to the following two types:
- Submitted and indexed
- Indexed, not submitted in sitemap
Submitted and indexed
These URLs were added via an XML sitemap, and they were then indexed.
No action is necessary.
Indexed, not submitted in sitemap
Even though these URLs weren’t uploaded via an XML sitemap, Google moreover identified and indexed them.
Verify whether these URLs should be indexed, and if they do, add them to your XML sitemap. If not, be sure to utilise the robots noindex directive and, if possible, omit them from inclusion in your robots.txt file to avoid problems with crawl budget.
Pro tip
All URLs will be reported with the type: “Indexed, not uploaded in sitemap” which is a little misleading if you have an XML sitemap but have just not published it to Google Search Console.
Valid URLs with warnings
There are just two types of the “Valid with warnings” status:
- “Indexed, though blocked by robots.txt”
- “Indexed without content”
Indexed, though blocked by robots.txt
These URLs have been indexed by Google but were not accessible due to your robots.txt file. Normally, Google wouldn’t have indexed these URLs, but it appears that links to them were detected, and as a result, they were indexed. The excerpts that are displayed are probably not the best ones.
Action Required: Review these URLs, update to your robots.txt file, and perhaps add robots noindex directives.
Indexed without content
These URLs have been indexed by Google, but Google was unable to find any content on them.
Reasons for this could include:
- The webpage was left blank.
- The information is in a format that Google cannot index.
- Cloaking
- Because they were blocked and received a 403 HTTP status code, for instance, Google was unable to render the page.
What must be done:
Check these URLs again to be sure they truly don’t contain any content. You can find out what Google sees when you request these URLs by using your browser and the URL Inspection Tool in Google Search Console. Simply request reindexing if everything seems to be in order.
Excluded URLs
The following kinds are included in the “Excluded” status:
- Alternate page with proper canonical tag
- Blocked by page removal tool
- Blocked by robots.txt
- Blocked due to access forbidden (403)
- Blocked due to other 4xx issue
- Blocked due to unauthorized request (401)
- Crawled – currently not indexed
- Discovered – currently not indexed
- Duplicate without user-selected canonical
- Duplicate, Google chose different canonical than user
- Duplicate, submitted URL not selected as canonical
- Excluded by ‘noindex’ tag
- Not found (404)
- Page removed because of legal complaint
- Page with redirect
- Soft 404
Alternate page with proper canonical tag
These URLs are duplicates of other URLs and are canonicalized to the chosen version of the URL in a proper manner.
What must be done:
Change the canonical to become self-referencing if these pages shouldn’t be canonicalized. Keep an eye on the number of pages displayed here as well. If you notice a significant increase but your site’s number of indexable pages hasn’t increased significantly, you may be experiencing problems with your internal linking structure and/or crawl budget.
Blocked by page removal tool
Due to a URL removal request, these URLs are presently not displayed in Google’s search results. This method of hiding URLs prevents them from appearing in Google search results for 90 days. After that time, Google might make these URLs visible once more.
Use the URL removal request capability to block URLs just short and temporarily. To genuinely stop these URLs from appearing again, we always advise taking further measures.
What must be done:
Make sure that these URLs are recrawled before the 90-day period expires and use the robots noindex directive to provide Google a clear signal that they shouldn’t index these URLs.
Blocked by robots.txt
Due to the robots.txt file on the website, Google does not index these URLs. This indicates that Google did not discover indications that were sufficient to permit indexing these URLs. The URLs would be listed under “Indexed, though blocked by robots.txt”. if they had.
What must be done:
Make sure the URLs given in this overview don’t include any crucial ones.
Blocked due to access forbidden (403)
Google was given a 403 HTTP response code because it was not permitted to access these URLs.
What must be done:
Make sure the URLs you wish to rank with are accessible to Google and other search engines at all times. It’s preferable to simply use the noindex directive if URLs that you don’t want to rank with are mentioned under this issue class.
Blocked due to other 4xx issue
Because some URLs returned 4xx response codes rather than the expected 401, 403, and 404, Google was unable to access them. Malformed URLs, for instance, which occasionally return the response code 400, can cause this.
What must be done:
To test if you can reproduce this behaviour, try obtaining these URLs using the URL inspection tool. If these URLs are crucial to you, look into the situation, take care of the problem, and then add the URLs to your XML sitemap. Just make sure to remove any references to these URLs if you don’t want to rank with them.
Blocked due to unauthorized request (401)
Google cannot access these URLs because they were denied access when they requested them, as shown by the 401 HTTP response code. This is often the case for staging environments that use HTTP Authentication to block public access.
What must be done:
Make sure the URLs given in this overview don’t include any crucial ones. If so, you must find out why, as it would be a severe SEO problem.
If your staging environment is on the list, find out how Google got it and get rid of any references.
Keep in mind that both internal and external links may be to blame. Visitors are probably able to find those if search engines can.
Crawled – currently not indexed
Google crawled these URLs, but they haven’t yet been indexed (yet). Possible explanations for a URL having this type:
- The URL was just crawled, and indexing is currently pending.
- The URL is known to Google, but it hasn’t deemed it valuable enough to index. for instance, due to the content’s thinness, lack of internal linkages, or duplication or weak content.
What must be done:
Verify that none of the URLs in this overview contain any crucial ones. If you do discover significant URLs, find out when the crawl took place. It’s likely that it happens soon if it’s relatively new and this URL has enough internal links to be indexed.
Discovered – currently not indexed
Google discovered these URLs, but they haven’t yet been crawled or indexed. They have been added to the crawling queue and are known to Google. This could be because Google tried to crawl these URLs previously but was unsuccessful due to the site being overloaded or for other reasons.
What must be done:
Pay attention to this. If the number of URLs grows, you may be experiencing crawl budget problems since your site is requiring more attention than Google is willing to give it. This can be the result of your site being too slow, unreliable, or lacking sufficient authority.
Duplicate without user-selected canonical
Google says these URLs are duplicates. Google believes these URLs aren’t the preferred versions since they aren’t canonicalized to the preferred version of the URL. As a result, they have chosen to remove these URLs from their index. Among these URLs, you’ll frequently find PDF files that are exact replicas of other PDFs.
What must be done:
Include canonical URLs with the recommended versions of the URLs. Apply the noindex directive using the meta robots tag or X-Robots-Tag HTTP Header if these URLs shouldn’t be crawled at all. Google may even display the URL’s canonical form when you use the URL Inspection tool.
Duplicate, Google chose different canonical than user
These URLs were discovered by Google on its own, and it views them as duplicates. Google decides to ignore the fact that you canonicalized them to your desired URL and uses a different canonical even if you did. On multilingual websites with a high degree of page similarity and sparse content, you’ll frequently find that Google chooses distinct canonicals.
What must be done:
Find out the URL Google has chosen as the preferred URL using the URL inspection tool, and see if that makes more sense. Because it has more links and/or greater content, Google may have chosen a different canonical, for example.
Duplicate, submitted URL not selected as canonical
These URLs have not been assigned a canonical URL despite the fact that you provided them using an XML sitemap. These URLs have been canonicalized with Google’s chosen canonical URLs because Google believes they are duplicates of other URLs.
Please be aware that although Google selected a different canonical than the user for this type, it is different in two ways from type Duplicate.
- You specifically requested that Google index these pages.
- Canonical URLs aren’t defined by you.
What must be done:
The recommended version of the URL is referenced by correct canonical URLs.
Excluded by ‘noindex’ tag
The noindex directive prevented Google from indexing certain URLs (either in the HTML source or HTTP header).
What must be done:
Make sure the URLs given in this overview don’t include any crucial ones. Remove the noindex directive and use the URL Inspection tool to request indexing if you do discover any significant URLs. Make sure there are no internal links leading to these sites because you don’t want the noindex’d pages to be accessible to the public.
Not found (404)
These URLs weren’t listed in an XML sitemap, but Google discovered them somehow, and since they returned an HTTP status code 404, it can’t index them. It’s possible that Google came upon these URLs through other websites or that they were formerly active.
What must be done:
Make sure the URLs given in this overview don’t include any crucial ones. In the event that essential URLs are discovered, either update the content on these URLs or 301 redirect the URL to a more appropriate replacement. This URL might be interpreted as a soft 404 if a highly relevant alternative isn’t redirected in its place.
Page removed because of legal complaint
Due to a legal complaint, Google deleted these URLs from its index.
What must be done:
You should be aware of every URL given in this overview because it’s possible that someone wishing you harm has asked Google to delete some of these URLs from its index.
Page with redirect
These URLs are redirecting, thus Google does not index them.
What must be done:
None.
Pro tip
This overview of redirecting pages is helpful when making a redirect plan if you’re working on a website migration.
Soft 404
These URLs are referred to as “soft 404 responses,” which refers to URLs that don’t actually return an HTTP status code 404 but instead give the impression that they are 404 pages by displaying messages like “Page can’t be found.” As an alternative, these mistakes could be the result of redirects leading to pages that Google deems to be insufficiently relevant. As an illustration, consider a product detail page that has been forwarded to one of its category pages or even the main page.
What must be done:
Verify that, if these URLs are genuine 404s, they provide a valid 404 HTTP status code. Make sure the content reflects that if they aren’t at all 404s.
Error URLs
The “Error” status contains the following types:
- Redirect error
- Server error (5xx)
- Submitted URL blocked by robots.txt
- Submitted URL blocked due to other 4xx issue
- Submitted URL has crawl issue
- Submitted URL marked ‘noindex’
- Submitted URL not found (404)
- Submitted URL seems to be a Soft 404
- Submitted URL returned 403
- Submitted URL returns unauthorized request (401)
Redirect error
Due to redirect problems, Google was unable to crawl these redirected URLs. Here are a few such problems that Google might have encountered:
- Redirect loops
- Redirect excessive chain lengths, Google follows five redirects per crawl attempt
- Redirect to an excessively long URL
What must be done:
Look into the cause of these redirects and make the necessary corrections.
Server error (5xx)
These URLs prevented Google from indexing this page by returning a 5xx error.
What must be done:
Identify the cause of the URL’s 5xx issue and fix it. Because the server was overloaded, you frequently find that these 5xx faults are only transient. Use Googlebot’s user-agent when making requests because other user-agents may affect the HTTP status code that is returned.
Submitted URL blocked by robots.txt
These URLs were submitted via an XML sitemap, but they weren’t indexed because Google had them blacklisted via a robots.txt file. This type and the other two we discussed previously are very similar.
How is this one different, then?
- The URLs would have been listed under “Indexed, but banned by robots.txt” if they had been indexed.
- The URLs would be displayed as “Blocked by robots.txt” if they are indexed rather than submitted via an XML sitemap.
These small variations are quite helpful for diagnosing problems like these.
What must be done:
Make sure you use the robots.txt file to avoid critical URLs from being blacklisted if they are mentioned. By choosing a URL and then clicking the TEST ROBOTS.TXT BLOCKING button on the right side, you may locate the robots.txt directive. Removed from the XML sitemap should be any URLs that Google shouldn’t be able to view.
Submitted URL blocked due to other 4xx issue
You uploaded these URLs using an XML sitemap, but instead of the expected 401, 403, and 404 response codes, Google received 4xx.
What must be done:
To determine if you can reproduce the problem, try obtaining these URLs using the URL inspection tool. Find out what’s happening and correct it if you can. Remove these URLs from the XML sitemap if they aren’t operating correctly and shouldn’t be indexed.
Submitted URL has crawl issue
These URLs were submitted via an XML sitemap, but Google had trouble crawling them. The crawl issues that don’t fit in any of the other classes fall under the “Submitted URL has crawl issue” type.
These crawl problems are frequently momentary in nature and will be classified as “regular” when they are checked again (for instance, “Not found (404)”).
What must be done:
To determine if you can reproduce the problem, try fetching some URLs using the URL inspection tool. Look into what’s going on if you can. If you can’t detect any problems and everything functions as it should, keep an eye on it because it might only be a short-term problem.
Submitted URL marked ‘noindex’
These URLs were provided via an XML sitemap, however the noindex directive is present (either in the HTML source or HTTP header).
What must be done:
- Please make sure to remove the noindex directive if there are any significant URLs listed.
- XML sitemap entries for URLs that shouldn’t be indexed ought to be deleted.
Submitted URL not found (404)
It appears that the URLs you provided via an XML sitemap don’t exist.
The sole distinction between this kind and the “Not found (404)” type we previously examined is that in this instance, the URLs were submitted via an XML sitemap.
What must be done:
- If any significant URLs are included, you should either 301 redirect the URL to a more appropriate replacement or restore its content.
- If not, take these URLs out of the XML sitemap.
Submitted URL seems to be a Soft 404
Google views these URLs as “soft 404s” despite the fact that you submitted them using an XML sitemap. These URLs may appear to display a 404 page while really returning an HTTP status code 200, or they may do so because of the page’s content.
The only distinction between this type and the Soft 404 type we previously discussed is that in this instance, you provided these URLs using the XML sitemap.
What must be done:
- Make sure they provide a valid 404 HTTP status code and are removed from the XML sitemap if these URLs truly are 404s.
- Make sure the content reflects that if they aren’t at all 404s.
Submitted URL returned 403
You provided an XML sitemap for these URLs, but Google was unable to access them and returned a 403 HTTP response.
This type is quite identical to the one below, with the only distinction being that a 401 HTTP response required login information.
What must be done:
Provide unfettered access if these URLs should be made public. If not, take these URLs out of the XML sitemap.
Submitted URL returns unauthorized request (401)
These URLs were supplied via an XML sitemap, but Google didn’t have permission to view them, as evidenced by the 401 HTTP response code. This is frequently observed for staging environments that use HTTP Authentication to make them inaccessible to the public.
The only difference between this kind and the “Blocked due to unauthorised request (401)” type we previously discussed is that in this instance, the URLs were submitted via an XML sitemap.
What must be done:
check to see if the 401 HTTP status code was delivered properly. If so, you should take these URLs out of the XML sitemap. If not, give Google permission to visit these URLs.
FAQ
Even though Google claims that the Index Coverage report is only valuable for websites with more than 500 pages, we advise everyone who relies substantially on organic traffic to use it. You don’t want to lose out on this because it offers such comprehensive information and is far more dependable than using their website’s operator to troubleshoot indexing problems.
There are many causes for this, but among them, canonicalized, redirecting, and robots.txt-blocked URLs are frequently seen in high concentrations. That quickly adds up, especially for large sites.
The status of Discovered – currently not indexed indicates that Google is aware of these URLs but hasn’t yet crawled (and hence indexed) them. If you have a small website with high-quality content (less than 10,000 pages), this URL state will automatically resolve once Google has crawled the URLs.
Leave a Reply Cancel reply