All Crawled Pages

Frequently Asked Questions

  • This could be due to the accessibility of your links and pages, the scope of your Campaign (check the URL you're tracking in your Campaign Settings,) or the weekly crawl allowance which can be adjusted in your Campaign settings. Read more about this in our guides on how to investigate fluctuations in the number of pages crawled and troubleshooting pages not crawled.
  • No, we do not currently support submitting a sitemap to our crawler. If your sitemap is linked on your site (not just in your robots.txt file), our crawler may attempt to crawl it but please note that sitemaps are only used as a guide for a crawler but that does not mean that the crawler will be able to access all of the pages listed.
  • When a new Site Crawl is completed for your Campaign, it will overwrite the data from the previous crawl and past crawl data will no longer be available. However, you can hover over the data points in the Total Pages Crawled and Issues graphs to see counts for past crawls and compare data over time.
  • No, when a new Site Crawl is completed for your Campaign, it will overwrite the data from the previous crawl and past crawl data will no longer be available.

What's Covered?

In this guide you’ll learn more about the All Crawled Pages section of Site Crawl within your Moz Pro Campaign.

Quick Links

Overview of Your Site's Crawled Pages

Want to see all the pages we crawled? All Crawled Pages is your pal. You'll see the total number of pages we crawled on your site. We'll crawl all the pages within the scope of your Moz Pro Campaign that we were able to find by following HTML links from one page to the next.

To see a breakdown of your crawled pages head to Site Crawl > All Crawled Pages.

Site Crawl All Crawled Pages menu location in the left hand navigation.

Within the chart you'll see Total Pages Crawled for every crawl of your site- hover over any data point in the graph for more information. You can use this to check for any big increases or decreases in pages crawled to monitor for any big changes on your site’s crawlability. If we were able to crawl a lot more pages on your site then you may also see an increase in the number of New Issues.

You'll also see and breakdown of your Pages by Status Code. The more 200s the better! 301s are normal provided they are what you intended. Keep your eyes out for 4xx and 5xx statuses which can affect your site's accessibility.

Click on a segment of the pie chart to filter the results by page status.

status type

You can also use the filters provided to select which of your All Crawled Pages you’d like to view and export. You have the option to filter by partial or complete URL, Status Code, and whether or not the page has Issues flagged for it. Click the magnifying glass in the Analyze column to view more information about that particular page.

Filter and sort options in the All Crawled Pages view.

Export All Crawled Pages

You can export your full site inventory from the All Crawled Pages section of your Campaign at any time. We offer the ability to export this data to PDF or CSV.

Export your All Crawled Pages to PDF via the link on the top right and to CSV via the link on the on the lower right.

Any filters in place at the time of your export will persist in the export itself.

When exporting to .csv, the following information will be included:

A. URL - The URL crawled.

B. Referring URL - Where the crawler found the URL. This may be another redirected URL or a page on your site.

C. Title - The title tag found in the source code of the page. A blank cell indicates no title tag was found.

D. Meta Description - The meta description found in the source code of the page. A blank cell indicates no meta description was found.

E. Status Code - The status code returned to our crawler when it attempted to access the URL.

F. Page Speed - The loading time for this page in seconds.

G. Canonical URL - The canonical URL found in the source code of the page. A blank cell indicates no canonical tag was found.

H. Redirect Location - If the URL crawled redirects to another page or URL, the redirect location is where that redirect points. For example, if the URL http://mysite.com redirects to http://mysite.com/home, the redirect location would be noted as http://mysite.com/home.

I. Link Count - The number of internal links we crawled on this page.

J. Page Authority - The Page Authority for this page. Page Authority is a Moz proprietary metric from 1-100 which predicts how well a page will rank in Google based on a machine-learning algorithm of link metrics.

K. Issue Count - The number of Site Crawl issues identified for this page/URL.

L. Word Count - The number of words counted in the body of the page.

M. Crawl Depth - How many links/pages the crawler followed from the home page to get to this URL. The home page or seed URL of your Campaign will have a crawl depth of 0.

N. No Follow - Notes whether the page is marked is nofollow with a TRUE or FALSE value, where TRUE indicates the page is flagged as nofollow.

O. No Index - Notes whether the page is marked is noindex with a TRUE or FALSE value, where TRUE indicates the page is flagged as noindex.

P. Robots - If found, the contents of the meta robots tag will be noted here.

Q. X-robots - If found, the contents of the X-robots-tag will be noted here.

R. URL Length - The character count of the URL.

S. Duplicate Content Group - If the page is flagged as duplicate content, the Duplicate Content Group will be noted in this column by a numerical value. All pages with the same numerical value for Duplicate Content Group are considered part of the same issue group and are flagged as duplicates of one another.

T. Duplicate Title Tag Group - If the page is flagged as having a duplicate title tag, the Duplicate Title Tag Group will be noted in this column by a numerical value. All pages with the same numerical value for Duplicate Title Tag Group are considered part of the same issue group and are flagged as having duplicate title tags.

U. Redirect 4xx Group - If the URL redirects to a 4xx page, the Redirect 4xx Group will be noted in this column by a numerical value. All pages with the same numerical value for Redirect 4xx Group are considered part of the same issue group and are flagged as redirecting to 4xx.

V. Redirect Chain Group - If the URL is part of an identified redirect chain, the Redirect Chain Group will be noted in this column by a numerical value. All pages with the same numerical value for Redirect Chain Group are considered part of the same issue group and are flagged as part of the identified redirect chain.

Fluctuations in the Number of Pages Crawled

Our Site Crawl bot, Rogerbot, finds pages by crawling all of the HTML links on the homepage of your site. It then moves on to crawl all of those pages and the HTML links and so on. Rogerbot continues like that until all of the pages we can find are crawled for your site, subdomain, or subfolder that was entered when you created your Campaign.

Usually, if a page is linked to from the homepage, it should end up getting crawled. If it doesn't, it's either a sign that those pages aren't as accessible as they could be to search engines.

Here are some things that can affect our ability to crawl your site:

  • Broken or lost internal links
  • If your site is built primarily with Javascript, especially if your links are in Javascript we won't be able to parse those links
  • Meta tags or robots.txt telling rogerbot not to crawl certain areas of the site
  • Lots of 5xx or 4xx errors in your crawl results

Learn more about fluctuations in pages crawled and how to monitor them with Moz Pro

Moz Crawler & Javascript

Our crawler can't parse Javascript very well. So if your site is built with a lot of Javascript, like Wix or another site builder platform, then we may not be able to find HTML links on your site to follow and crawl.

How do I know if my site has been built with Javascript? You can check your page source code if you know how, or you can turn off Javascript in your browser to see if you site loads, here are the steps:

  1. Open your site in your browser
  2. Turn off Javascript (temporarily) in your browser (Chrome chrome://settings/content/javascript)
  3. Refresh your site and see if your site elements render in the browser
  4. Don't forget to turn Javascript back on when you're done

We don't have a good workaround for crawling sites built primarily with Javascript at this time. You may want to try Screaming Frog which we know handles Javascript.


Woo! 🎉
Thanks for the feedback.

Got it.
Thanks for the feedback.