As part of your Google Search Console and Analytics reports, you may receive notifications that certain pages aren't being indexed and that many pages are being redirected, even though you didn't enter them in your Redirect Center. The report may look something like this:
Google provides some training on understanding this report here: Page Indexing report. Here's more information on what it can mean: Why is my page missing from Google Search?
The URL inspection tool will be the most useful to assess whether the report and reality match. You also have the "validate" option so that you can look at the list of issues and tell Google that a particular page is valid and should be indexed.
Pages with redirect
The URLs listed may be from imports of your site data from another CMS or directly from a site you had with a different hosting provider, from pages that have been moved within your Metro Publisher Admin, or from pages that have had the URL changed manually within your Metro Publisher site. They can also be redirects to external URLs or redirects in your Redirect Center. Google is reporting that the original URL is being redirected to a new one, which is correct.
Pages that used to have an "http://" URL before browsers switched to the secure "https://" protcol version of URLs will also be redirected. This means that once you switched to HTTPS only, i.e. when you added an SSL certificate to your site, then all the unsecure HTTP pages you had before began being automatically redirected by our system for your convenience and to make sure your site isn't flagged as a security risk by all browsers.
The redirects follow best practice. They are so-called 301 redirects, which tell search engines a page has permanently moved. You won't find all of those in your Redirect Tool center since you never manually set them up to redirect. Google eventually stops indexing the old pages, which is why these redirected pages appear under the Page Indexing heading. The new (correct and current) pages will be indexed.
Blocked by robots.txt
These are malicious bots and AI crawlers that spam websites and servers. Metro Publisher actively blocks these types of bots for all clients. Please visit our help document for more details: Protecting Your Site From Excessive Bot Traffic
404s (Not Found)
404s do not harm your site's indexing or ranking. A 404 page is only negative when it’s caused by a broken link on your site, because that leads to a negative user experience. Please visit our help article about this topic here: 404 Error Myth
We provide search engines with your sitemaps, metadata, and microdata and the search engines crawl the links at their own discretion. We provide redirects, that's why the pages that shouldn't be indexed are returning 404s. Any Content/Events in draft mode, or expired, or deleted should return a 404 error. That is best practice. The report is a hint for you to check if your 404s are correct, not a list of errors that need fixing. None of this means visitors are encountering broken links on your site, but please always check the list of links to make sure you don't have individual broken links on your site.
404 pages returning indexing errors are often also so-called "soft 404s", meaning the page could be a Tag page for a Tag not in use, or a page with little content, for example. If search engines deem a page too empty, they will mark it as having no value and read it as a 404 page and not index it, even though site visitors won’t see a 404 page if they visit that particular page.
To avoid those make sure to delete automatically imported Tags from stock photographs in particular because those will be empty pages: Auto-generated Tags Due to Metadata on Images and Stock Photo Uploads
In this latter case, ideally you would tag more content with it and thereby bulk up the page. Another choice is to consider removing that tag if it is not going to be attached to more content in the short term.
If you'd like to keep any soft 404 pages with little content, we suggest you try to get Google to recrawl the pages as follows:
- Click on the “Submitted URL seems to be a soft 404” button from the Index Report, if you see that somewhere.
- By opening each of those links from within the Index Report in a new browser tab, you should be able to select a “Validate Fix” option. That would initiate recrawling of the page and hopefully an update of the status code to a perfectly valid page.
- You can reinspect the page afterwards and test the live URL, keeping in mind that recrawling isn't instant!
- Another option is to select the URL and click the “Inspect URL” option. That will lead you to the “Request Indexing” option for that page, i.e. also a recrawl. If this is unsuccessful, you can resort to disallowing those URLs via the robots.txt file and we can help you with that, of course. You shouldn't have to take this route before requesting Google fix its own mistake, however.
Here is Google's own support document on recrawling: Ask Google to Recrawl
Google needs a while to recrawl your site, so keep that in mind. Here is a quote from their document:
"Crawling can take anywhere from a few days to a few weeks. Be patient and monitor progress using either the Index Status report or the URL Inspection tool.
Requesting a crawl does not guarantee that inclusion in search results will happen instantly or even at all. Our systems prioritize the fast inclusion of high quality, useful content."
We still recommend requesting the reindexing.
Alternate page with proper canonical tag
This will partially have to do with 301 (page permanently moved) redirects from imports, if one was done.
When you redirect a URL, Google keeps track of both the redirect source (the old URL) and the redirect target (the new URL). This means it is registering duplicate content. One of the URLs will be the canonical URL. A canonical URL is the URL of a page that Google chooses as the most representative from a set of duplicate pages; which one depends on signals such as whether the redirect was temporary or permanent.
301 redirects define a page as “permanently moved” with the goal that the new URL be considered the representative / canonical one.
The other URL becomes an alternate name of the canonical URL. Alternate names may appear in search results when a user's query hints that they might trust the old URL more. In other words, alternate names are different versions of a canonical URL that users might recognize and trust more.
Here are additional explanations of what canonical URLs are and where they can be used:
301, 302, and Canonical URL Differences
Google will always penalize anything it considers true duplicate content, e.g. when websites cross-post each other's articles, so you'll have to look into those and apply canonical URLs if you feel they are justified.
Google AMP URLs may also be listed here if you have that activated, since it can take quite a while for Google to index AMP pages. There may also be errors in the URL such as spaces (indicated by the string %20 in the URL) or incorrect canonical URL entries.
Here is a relevant help document from Google: Canonicalization for Crawling and Indexing by Google
Crawled – currently not indexed
These URLs are from pages that were deleted, expired events, and closed locations. These pages should not be indexed. Google crawled them, noted they should not be indexed and did not index them at this time or removed them from indexing. The microdata we provide to search engines about events, e.g. start and end date etc., also lets search engines know if an event is outdated.
That being said, we recommend deleting Expired Events and Content from your database: Housekeeping to Improve Site Speeds and Indexing
Please note that Google AMP pages may not be indexed by Google in a timely manner.
Discovered – currently not indexed
These pages are at each search engine’s discretion. Their crawlers crawl when they want to and what they want to. Pages considered uninteresting or which haven’t been crawled yet for whatever reason will not be indexed.
No website page is set to not to be crawled unless you have manually specified it in your robots.txt file. They link search engine crawlers to the pages of the site. We include your sitemap URLs in the robots.txt file by default.
Please note the following Google quotes:
"Keep in mind that submitting a sitemap is merely a hint: it doesn't guarantee that Google will download the sitemap or use the sitemap for crawling URLs on the site."
"Crawling can take anywhere from a few days to a few weeks. Be patient and monitor progress using either the Index Status report or the URL Inspection tool. Requesting a crawl of a URL does not guarantee that inclusion in search results will happen instantly or even at all. Our systems prioritize the fast inclusion of high quality, useful content."
Sources: Build and Submit a Sitemap and Ask Google to recrawl your URLs
Duplicate without user-selected canonical
URLs listed here need to be looked at individually.
Please also refer to the Google help document about canonicalization here: Canonicalization for Crawling and Indexing by Google
Another relevant Google help document is here: Duplicate URLs
Duplicate - Google chose different canonical
Just as above, these URLs need to be looked at individually.
Please refer also to the Google help document about canonicalization here: Canonicalization for Crawling and Indexing by Google
Another relevant Google help document is here: Duplicate URLs
Server error
You can check the URL detail for more information if an error is listed. It will usually be referencing a temporary disruption e.g. when an oversized or corrupted file (e.g. ZIP) is attempted to be uploaded and the action times out or the file cannot be saved because our system proofreads each such file.

Comments