Moz Can't Crawl Your Site

Frequently Asked Questions

  • You don't need a robots.txt for us to crawl your site, however an error code response for your robots.txt can affect the crawlability of your site. As a general rule we recommend that you have an accessible robots.txt file set up for your site.
  • Check your domain in your browser and add /robots.txt after your domain. For example moz.com/robots.txt.
  • We start crawling from the http protocol and follow redirects and links from there to crawl your site.
  • Yes, if you're seeing a crawl error you'll want to make sure you are not blocking AWS (Amazon Web Services).
  • Unfortunately we do not use a static IP address or range of IP addresses for rogerbot, please read our rogerbot guide for more information.

Troubleshooting Issues Crawling Your Site

You’ve just been notified that we're having trouble crawling your site. In order to successfully crawl your site we will need to be able to access your robots.txt file starting at the non-www HTTP version of the site. For example http://moz.com/robots.txt.

To get started, try these steps.

  1. In your browser check your homepage and robots.txt file are accessible from the HTTP version of your site, and that you have the correct redirects in place to the HTTPS, if you have https set up on your site
  2. Check your robots.txt file is accessible in your browser, for example moz.com/robots.txt
  3. Check your robots.txt file is accessible to crawlers using a tool like http://httpstatus.io/
  4. Check the details in your robots.txt file to make sure you're not blocking all bots, or our bot rogerbot
  5. Check how your server is responding to rogerbot in your server logs which you can obtain from your website host or administrator
  6. Try a recrawl through Moz Pro Site Crawl (if you have a Medium Moz Pro plan and above)

Once you have fixed any issues with your robots.txt start a recrawl of your site using the Recrawl button in Moz Pro Site Crawl. You'll need a Moz Pro Medium plan or above to use this feature.

Our Crawler Was Banned by a Page on Your Site

Example of the error message seen in-app when rogerbot is banned from crawling.

If you’ve received a message that our crawler was banned from your site, either through your robots.txt file or by a X-Robots-Tag HTTP header on your page, there are few things you can check. First, you’ll want to check that there isn’t a disallow directive in place in your robots.txt file that is keeping us from crawling your page. You can read more about rogerbot and how it crawls your site! Another place you’ll want to check is your X-Robots-Tag HTTP header. This would be found in the source code of your homepage and would look like this:

          
<p>&lt;meta name=“robots” content=“noindex, nofollow”&gt;</p>
        

If our crawler encounters a nofollow directive in this meta tag, It won’t be able to follow any of the links on that page and the crawler will stop. If this is in place on your homepage, that means we won’t be able to move past that page and you’ll see a 1 page crawl report.

One last thing you should check in this situation is that you’re not blocking Moz on a server level. If rogerbot or AWS is blocked by your server, we won’t be able to access your site, regardless of whether or not the robots.txt file has a disallow directive! If you are blocking rogerbot at a server level this is most likely something put in place by your website administrator, so it's best to reach out to them to find out more.

Additionally, Amazon Web Application Firewall (WAF) blocks most SEO based crawlers when applying the default CRS rule which will cause this error to be noted in your account. To resolve, you should be able to add a custom rule for rogerbot with a higher priority.

Our Crawler Was Blocked by a Forbidden Response

Example of the error message seen in-app when our crawler receives a forbidden response.

If you’ve received a message that our crawler was blocked from your server by a Forbidden response, this means that when rogerbot attempted to access your site it was denied by the server. When rogerbot is blocked on a server level, it is not able to continue the crawl.

In order to resolve the issue, please ensure that both rogerbot and AWS are fully whitelisted with your server. If you are blocking rogerbot at a server level this is most likely something put in place by your website administrator, so it's best to reach out to them to find out more.

Our Crawler Was Not Able to Access the Robots.txt File on Your Site

Example of the error message received in-app when our crawler can't access the robots.txt file.

Try a recrawl

If you have a Moz Pro Medium plan or above, and your crawl isn't currently in progress, you can request a crawl of your site through your Site Crawl tab.

Location of the recrawl button in your Site Crawl section at the top left.

Once the recrawl is complete you'll get an email from us, hopefully we've been able to crawl your site on the second attempt which would indicate that it was a temporary issue. If it's still unsuccessful, check the steps below. You may need to speak to your hosting provide to find out if they can help.

Check your Robots.txt file

If you’ve received a message that our crawler wasn’t able to access your robots.txt file, there could be a few things going on. First, you’ll want to try and access your robots.txt file in a browser.

Access your robots.txt file by entering your site's homepage followed by /robots.txt - for example moz.com/robots.txt.

If you’re not able to access it and are getting an error, that means our crawler can’t access it either. :-( You’ll need to check with your web developer to make sure that the settings are correct so we can access your site and start pulling in that data! :-)

In addition, you can use third party tools like httpstatus.io to see if bots are able to reach your robots.txt file. If these tools are not able to access your site, it’s likely that rogerbot will have some trouble, too.

Check your redirects

Sometimes the issue here is that your site is set up at the www version but there’s not redirect in place telling our crawler to head there. If your Campaign is currently set to track all sites within your domain, our crawler will try to start at http://yoursite.com. If there isn’t a redirect in place to http://www.yoursite.com, our crawler won’t know it needs to move forward to the www page and will stop. One way to test this is to input http://yoursite.com/robots.txt into a browser- if it doesn't redirect and you get an error saying that the page is invalid, you know there’s a problem. You can also use the httpstatus.io tool to check. If the tool comes back with an “error fetching URL” result, you’ll need to look into either fixing a current redirect or adding one in. You can check this setting within your Campaign Settings under Site Basics.

Check your server logs

You may also want to check your server logs to see what your server is returning to our crawler, rogerbot, or to see if there was an outage at the time we tried to start crawling your site. You may need to reach out to your website administrator or hosting provider for a copy of your server logs.

Page Redirects Outside the Scope of Your Campaign Settings

Example of the error message seen in-app when our crawler is being pushed outside the scope of your Campaign settings.

If you receive an error saying that your page redirects or links to a page that is outside of the scope of my Campaign Settings, there are a couple things we’re going to need to check.

First, you’ll want to check the settings for your Campaign — you can do so by heading to the Campaign Settings in the left hand-navigation menu:


Under the Site Basics section, you should see the website you’re tracking with a note beneath it saying either Tracking all sites at this domain or Tracking this subfolder or subdomain. But what does that really mean, and what does it have to do with the error? Great question!

If you have your Campaign set up to track just a subdomain or subfolder, but your site redirects to another subdomain or subfolder, we won’t be able to continue the crawl. For example, say you’ve set up your Campaign to track the subdomain of cupcakeroyale.com, but there is a redirect in place to www.cupcakeroyale.com — the www version of your site now falls outside the scope of the Campaign Settings. This is the same for subfolders. Our crawler will only be able to crawl the subfolder specified in your Campaign Settings (for example, moz.com/blog), and won’t be able to access links to other subfolders (like moz.com/about). If this is what’s happening on your site, you’ll want to set up a new Campaign which is either restricted to the subdomain your site is redirecting to or which is unrestricted and tracks all sites within the domain.

Okay, but what if your Campaign Settings say you’re tracking all sites within the domain but you’re still getting this error — what then? Well that makes it a little trickier! In that case, we’ll need to take a look at your site itself and make sure there are accessible links in the source code of your homepage which are directing to other parts of your site and are not pointing externally. An easy way to find this out would be to search your source code for the href links (these are the links our crawler is able to follow) and see if they fall within your site’s root domain. For example, if you’re tracking the site meghan.com but all your href links point to lisa.com, we won’t be able to crawl those links, as they fall outside the scope of your Campaign Settings.

You can also export a CSV of all your crawled pages to check the link count of your site. This can be useful when trying to determine if there is a part of your site being missed in the crawl and how our crawler is finding your links.

First head to the all crawled pages section of your Site Crawl.
The export to CSV button is on the right hand side.

You can then verify the Link Count for a page in column I. If this is zero, then we weren’t able to find any html links on that page to follow and the crawl wasn’t able to progress.

You’ll also want to take a look at the Redirect Target identified in column H.

If your crawl report only includes one page and the Redirect Target has a root domain (or subdomain, if your Campaign is restricted) which is different from the site you’re tracking, our crawler won’t be able to move forward and follow the redirect.

Unable to Access Your Homepage Due to a Redirect Loop

Example of the error message seen in-app when we are not able to crawl due to a redirect loop.

This can be caused by an issue with how your site or your robots.txt file is setup. To check this out enter your site's root domain (eg moz.com) as well as your site's robots.txt (eg moz.com/robots.txt) into this third party tool httpstatus.io. Check that your urls redirect to the correct location. If you're seeing any errors or incorrect redirects then it's best to reach out to your website administrator to find out more.

A third party app can help to identify the redirect loop in place.

Only Crawling a Few Pages on Your Site

First off check if we're following the redirects on your homepage to the final destination. If rogerbot isn't able to follow your site redirects you can try this workaround:

  1. Set up your Campaign to track a specific subdomain or subfolder within your site, including the www subdomain

  2. Click on the Advanced Settings option to get specific and limit the scope of your Campaign to a specific subdomain

If the crawl for this Campaign is successful then you can use this Campaign going forward. You may still want to investigate why we couldn't follow the redirects on your site by speaking with your website host or administrator.

Be sure to click Advanced Settings and check the box during Campaign setup to restrict your Campaign.

Check Link Count

Our crawler is designed to follow HTML links in your source code (a href links). If our crawler is unable to find the a href links on a page, it won’t move forward. To start troubleshooting this issue,

  1. Export a CSV of all your crawled pages to check the Link Count column
  2. If you're seeing 0 links in this column we've not been able to locate any crawlable links
  3. Take a look at the source code for that page to see if you’re able to find an a href link on that page

Still having trouble? Is your site primarily Javascript? This could be causing some trouble for Rogerbot! Our crawler doesn’t work very well with Javascript and sometimes has trouble parsing out the code. The good news is, even though your Site Crawl data may not be the most accurate in this case, your keyword ranking and link profile should be good to go! Those parts of your Campaign don’t rely on the Site Crawl data.

Moz Crawl Errors and Single Page Apps

If the site you’ve set up for your Campaign is a single page app, meaning that the site is all contained within one page with anchor links that point to different areas of the same homepage, you may end up seeing a crawl error in your Site Crawl. The most common crawler warning we see with this type of site states that the site redirects to a page outside the scope of your Campaign settings. This is because when the crawler attempts to find more links on your site, it can’t find any since they are all contained within one page on your site’s server. However, if you have links to social media profiles or other outbound links, it will think you’re asking it to crawl those which do, in fact, fall outside the scope of your Campaign.

If we are receiving a crawl fail error for your Campaign and you know your site is a single page app, be sure to check that we were able to crawl your page. To do this head to Site Crawl > All Crawled Pages where you'll see a page count of 1 (or 2 if your site redirects to the https version). Although we don’t currently have a way to remove or stop these crawl error warnings from appearing, you can still use the Crawl data to monitor the SEO of your single page app and to collect rankings data for your site.


Woo! 🎉
Thanks for the feedback.

Got it.
Thanks for the feedback.