Skip to content
    Moz logo Menu open Menu close
    • Products
      • Moz Pro
      • Moz Pro Home
      • Moz Local
      • Moz Local Home
      • STAT
      • Mozscape API
    • Free SEO Tools
      • Competitive Research
      • Link Explorer
      • Keyword Explorer
      • Domain Analysis
      • MozBar
      • More Free SEO Tools
    • Learn SEO
      • Beginner's Guide to SEO
      • SEO Learning Center
      • Moz Academy
      • SEO Q&A
      • Webinars, Whitepapers, & Guides
    • Blog
    • Why Moz
      • Agency Solutions
      • Enterprise Solutions
      • Small Business Solutions
      • Case Studies
      • The Moz Story
      • New Releases
    • Log in
    • Log out
    • Products
      • Moz Pro

        Your All-In-One Suite of SEO Tools

        The essential SEO toolset: keyword research, link building, site audits, page optimization, rank tracking, reporting, and more.

        Learn more
        Try Moz Pro free
        Illustration of Moz Pro
      • Moz Local

        Complete Local SEO Management

        Raise your local SEO visibility with easy directory distribution, review management, listing updates, and more.

        Learn more
        Check my presence
        Illustration of Moz Local
      • STAT

        Enterprise Rank Tracking

        SERP tracking and analytics for SEO experts, STAT helps you stay competitive and agile with fresh insights.

        Learn more
        Book a demo
        Illustration of STAT
      • Mozscape API

        The Power of Moz Data via API

        Power your SEO with the proven, most accurate link metrics in the industry, powered by our index of trillions of links.

        Learn more
        Get connected
        Illustration of Mozscape API
      • Compare SEO Products
    • Free SEO Tools
      • Competitive Research

        Competitive Intelligence to Fuel Your SEO Strategy

        Gain intel on your top SERP competitors, keyword gaps, and content opportunities.

        Find competitors
        Illustration of Competitive Research
      • Link Explorer

        Powerful Backlink Data for SEO

        Explore our index of over 40 trillion links to find backlinks, anchor text, Domain Authority, spam score, and more.

        Get link data
        Illustration of Link Explorer
      • Keyword Explorer

        The One Keyword Research Tool for SEO Success

        Discover the best traffic-driving keywords for your site from our index of over 500 million real keywords.

        Search keywords
        Illustration of Keyword Explorer
      • Domain Analysis

        Free Domain SEO Analysis Tool

        Get top competitive SEO metrics like Domain Authority, top pages, ranking keywords, and more.

        Analyze domain
        Illustration of Domain Analysis
      • MozBar

        Free, Instant SEO Metrics As You Surf

        Using Google Chrome, see top SEO metrics instantly for any website or search result as you browse the web.

        Try MozBar
        Illustration of MozBar
      • More Free SEO Tools
    • Learn SEO
      • Beginner's Guide to SEO

        The #1 most popular introduction to SEO, trusted by millions.

        Read the Beginner's Guide
      • How-To Guides

        Step-by-step guides to search success from the authority on SEO.

        See All SEO Guides
      • SEO Learning Center

        Broaden your knowledge with SEO resources for all skill levels.

        Visit the Learning Center
      • Moz Academy

        Upskill and get certified with on-demand courses & certifications.

        Explore the Catalog
      • On-Demand Webinars

        Learn modern SEO best practices from industry experts.

        View All Webinars
      • SEO Q&A

        Insights & discussions from an SEO community of 500,000+.

        Find SEO Answers
      The Impact of Local Business Reviews
      SEO Industry Report

      The Impact of Local Business Reviews

      Learn more
    • Blog
    • Why Moz
      • Small Business Solutions

        Uncover insights to make smarter marketing decisions in less time.

        Grow Your Business
      • The Moz Story

        Moz was the first & remains the most trusted SEO company.

        Read Our Story
      • Agency Solutions

        Earn & keep valuable clients with unparalleled data & insights.

        Drive Client Success
      • Case Studies

        Explore how Moz drives ROI with a proven track record of success.

        See What's Possible
      • Enterprise Solutions

        Gain a competitive edge in the ever-changing world of search.

        Scale Your SEO
      • New Releases

        Get the scoop on the latest and greatest from Moz.

        See What’s New
      Surface actionable competitive intel
      New Feature: Moz Pro

      Surface actionable competitive intel

      Learn More
    • Log in
      • Moz Pro
      • Moz Local
      • Moz Local Dashboard
      • Mozscape API
      • Mozscape API Dashboard
      • Moz Academy
    • Avatar
      • Moz Home
      • Notifications
      • Account & Billing
      • Manage Users
      • Community Profile
      • My Q&A
      • My Videos
      • Log Out

    The Moz Q&A Forum

    • Forum
    • Questions
    • Users
    • Ask the Community

    Welcome to the Q&A Forum

    Browse the forum for helpful insights and fresh discussions about all things SEO.

    1. Home
    2. SEO Tactics
    3. Intermediate & Advanced SEO
    4. What happens to crawled URLs subsequently blocked by robots.txt?

    What happens to crawled URLs subsequently blocked by robots.txt?

    Intermediate & Advanced SEO
    3
    6
    2101
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as question
    Log in to reply
    This topic has been deleted. Only users with question management privileges can see it.
    • AspenFasteners
      AspenFasteners Subscriber last edited by

      We have a very large store with 278,146 individual product pages. Since these are all various sizes and packaging quantities of less than 200 product categories my feeling is that Google would be better off making sure our category pages are indexed.

      I would like to block all product pages via robots.txt until we are sure all category pages are indexed, then unblock them. Our product pages rarely change, no ratings or product reviews so there is little reason for a search engine to revisit a product page.

      The sales team is afraid blocking a previously indexed product page will result in in it being removed from the Google index and would prefer to submit the categories by hand, 10 per day via requested crawling.

      Which is the better practice?

      terentyev 1 Reply Last reply Reply Quote 1
      • seoelevated
        seoelevated Subscriber @AspenFasteners last edited by

        @aspenfasteners To my understanding, disallowing a page or folder in robots.txt does not remove pages from Google's index. It merely gives a directive to not crawl those pages/folders. In fact, when pages are accidentally indexed and one wants to remove them from the index, it is important to actually NOT disallow them in robots.txt, so that Google can crawl those pages and discover the meta NOINDEX tags on the pages. The meta NOINDEX tags are the directive to remove a page from the index, or to not index it in the first place. This is different than a robots.txt directive, whcih is intended to allow or disallow crawling. Crawling does not equal indexing.

        So, you could keep the pages indexable, and simply block them in your robots.txt file, if you want. If they've already been indexed, they should not disappear quickly (they might, over time though). BUT if they haven't been indexed yet, this would prevent them from being discovered.

        All of that said, from reading your notes, I don't think any of this is warranted. The speed at which Google discovers pages on a website is very fast. And existing indexed pages shouldn't really get in the way of new discovery. In fact, they might help the category pages be discovered, if they contain links to the categories.

        I would create a categories sitemap xml file, link to that in your robots.txt, and let that do the work of prioritizing the categories for crawling/discovery and indexation.

        1 Reply Last reply Reply Quote 0
        • terentyev
          terentyev @AspenFasteners last edited by

          @aspenfasteners to answer your question: "do we KNOW that Google will immediately de-index URL's blocked by robots.txt?"

          Google will not immediately de-index URLs that are blocked by robots.txt, based on my experience. I've dealt with very similar situation but with much greater scale - around 8M automatically generated pages that got into Google index. It may take a year or more to de-index these pages completely. Of course, every case is different, but based on my understanding, if you block these low-quality product pages, Google will slowly start re-evaluating these pages, and it will start with the ones that get some traffic.

          Here is what happens when Google re-evaluates your individual product pages:

          When deciding, whether to keep a page in its index or not, Google takes into account multiple factors, and one of the most important ones is how many backlinks (both internal and external) are leading to a page. Other factors - content quality, if the page is similar or duplicate to another page, Core Web Vitals score, amount of your crawl budget, and, of course, external backlinks (which is irrelevant for your case).

          If you are afraid of loosing some traffic that comes to these product pages, or you have other concerns, just do a smaller experiment: take a sample of 1000-2000 pages, block them in robots.txt or by adding meta robots "noindex, follow" directive, and observe Google's reaction in 1-6 weeks, depending on your crawl budget.

          Another thing to check:

          If you use Screaming Frog, it has a nice feature to show internal pagerank and the number of internal incoming links that lead to every page. As a rule of thumb, if an individual product page has at least 10 internal incoming links from canonicalized pages, there is a high probability it will get indexed.

          1 Reply Last reply Reply Quote 0
          • AspenFasteners
            AspenFasteners Subscriber @terentyev last edited by

            @terentyev - sorry, can't edit my questions once submitted and I wait for approval (why?) the statement should read my question SHOULD be very specific, whereas my original question was much more general - you answered that question very nicely. Sorry for any misunderstanding

            terentyev seoelevated 2 Replies Last reply Reply Quote 0
            • AspenFasteners
              AspenFasteners Subscriber @terentyev last edited by

              @terentyev thanks for the reply. We have no reason to believe these URL's are backlinked. These aren't consumer products that individual are interested in, our site is a wholesale B2B selling very narrow categories in bulk quantities typically for manufacturing. Therefore, almost zero chance for backlinks anywhere for something as specific as a particular size/material/package quantity of a product.

              We have already initiated a canonicalization project started but we are stuck between two concerns from sales, 1) we can't wait for canonicalization (which is complex) we need sales now and 2) don't touch robots.txt because MAYBE the individual products are indexed.

              So that is why my question is very specific - do we KNOW that Google will immediately de-index URL's blocked by robots.txt?

              1 Reply Last reply Reply Quote 0
              • terentyev
                terentyev @AspenFasteners last edited by

                @aspenfasteners thanks for interesting question.
                to summarize my understanding:

                1. you have ~300K individual product pages, many of them are duplicates; eg. a single product can have multiple characteristics (eg. size or quantity) but the pages are essentially the same.
                2. your goal is to index 200 product categories that contain a collection of these products, and remove the low-quality duplicate individual pages from Google index in the long run.
                3. my assumption is that these 300K product pages have been historically accumulating some backlinks, which is one of the reasons why they are indexed.

                If I am right about the 1 and 2, then you should not block these individual product pages, but rather add canonical URLs to them, which should point to the respective category page that you want to get indexed.

                Once you have these canonicals implemented, you should wait for a few months or more for Google to pass the link equity to your 200 product category pages, and once it is done, you are free to block them from indexing on robots.txt + meta tag on the page itself, and maybe even x-robots-tag. The way how to block them - it is a different discussion. Let me know if you want to learn more on the best approach.

                So, here is my checklist for this URL migration:

                1. add canonicals pointing from product pages to category pages.
                2. make sure that all category pages are well interlinked between each other, and the individual product pages are linked to several category pages (eg. a product A should be linked to category A, and also to similar categories B & C). As a rule of thumb, make sure that each category page has at least 10 incoming links from other category pages.
                3. Make sure that all these category pages are linked from your homepage
                4. Make sure that sitemap contains only self-canonicalized pages.
                5. Make sure that these category pages have good core web vitals metrics, compared to your competitors on SERP.
                6. In 2-3 months, when you see that Google indexes the category pages, and crawling of product pages have been reduced significantly, and the ranks of the category pages have gone up, it is ok to block these 300K pages from crawling.

                As to manually submitting the categories by hand, I doubt it will help, especially if the product pages have a lot of backlinks. I've seen many cases when Google disregards the robots.txt directives if a page has good backlinks and traffic.

                AspenFasteners 2 Replies Last reply Reply Quote 0
                • 1 / 1
                • First post
                  Last post

                Got a burning SEO question?

                Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.


                Start my free trial


                Browse Questions

                Explore more categories

                • Moz Tools

                  Chat with the community about the Moz tools.

                • SEO Tactics

                  Discuss the SEO process with fellow marketers

                • Community

                  Discuss industry events, jobs, and news!

                • Digital Marketing

                  Chat about tactics outside of SEO

                • Research & Trends

                  Dive into research and trends in the search industry.

                • Support

                  Connect on product support and feature requests.

                • See all categories

                Related Questions

                • Nitruc

                  Google Only Indexing Canonical Root URL Instead of Specified URL Parameters

                  We just launched a website about 1 month ago and noticed that Google was indexing, but not displaying, URLs with "?location=" parameters such as: http://www.castlemap.com/local-house-values/?location=great-falls-virginia and http://www.castlemap.com/local-house-values/?location=mclean-virginia. Instead, Google has only been displaying our root URL http://www.castlemap.com/local-house-values/ in its search results -- which we don't want as the URLs with specific locations are more important and each has its own unique list of houses for sale. We have Yoast setup with all of these ?location values added in our sitemap that has successfully been submitted to Google's Sitemaps: http://www.castlemap.com/buy-location-sitemap.xml I also tried going into the old Google Search Console and setting the "location" URL Parameter to Crawl Every URL with the Specifies Effect enabled... and I even see the two URLs I mentioned above in Google's list of Parameter Samples... but the pages are still not being added to Google. Even after Requesting Indexing again after making all of these changes a few days ago, these URLs are still displaying as Allowing Indexing, but Not On Google in the Search Console and not showing up on Google when I manually search for the entire URL. Why are these pages not showing up on Google and how can we get them to display? Only solution I can think of would be to set our main /local-house-values/ page to noindex in order to have Google favor all of our other URL parameter versions... but I'm guessing that's probably not a good solution for multiple reasons.

                  Intermediate & Advanced SEO | | Nitruc
                  0
                • EcommerceSite

                  What would cause these ⠃︲蝞韤諫䴴SPপ� emblems in my urls?

                  In Search Console I am getting errors under other. It is showing urls that have this format- https://www.site.com/Item/654321~SURE⠃︲蝞韤諫䴴SPপ�.htm When clicked it shows 蝞韤諫䴴SPপ�  instead of the % stuff. As you can see this is an item page and the normal item page pulls up fine with no issues. This doesn't show it is linked from anywhere. Why would google pull this url? It doesn't exist on the site anywhere. It is a custom asp.net site. This started happening in mid May but we didn't make any changes then.

                  Intermediate & Advanced SEO | | EcommerceSite
                  0
                • rootwaysinc

                  Default Robots.txt in WordPress - Should i change it??

                  I have a WordPress site as using theme Genesis  i am using default robots.txt. that has a line Allow: /wp-admin/admin-ajax.php, is it okay or any problem. Should i change it?

                  Intermediate & Advanced SEO | | rootwaysinc
                  0
                • ntcma

                  Should I use meta noindex and robots.txt disallow?

                  Hi, we have an alternate "list view" version of every one of our search results pages The list view has its own URL, indicated by a URL parameter I'm concerned about wasting our crawl budget on all these list view pages, which effectively doubles the amount of pages that need crawling When they were first launched, I had the noindex meta tag be placed on all list view pages, but I'm concerned that they are still being crawled Should I therefore go ahead and also apply a robots.txt disallow on that parameter to ensure that no crawling occurs? Or, will Googlebot/Bingbot also stop crawling that page over time? I assume that noindex still means "crawl"... Thanks 🙂

                  Intermediate & Advanced SEO | | ntcma
                  0
                • bjs2010

                  Correct URL Parameters for GWT?

                  Hi, I am just double checking to see if these parameters are ok - I have added an attachment to this post. We are using an e-commerce store and dealing with faceted navigation so I excluded a lot of parameters from being crawled as I didnt want them indexed. (they got indexed anyway!). Advice and recommendations on the use of GWT would be very helpful - please check my screenshot. thanks, B0gSmRu

                  Intermediate & Advanced SEO | | bjs2010
                  0
                • JessieT

                  Two homepage urls

                  We have two different homepages for our website. One is designed for daytime users (i.e. businesses), whereas the second night version is designed with home consumers in mind. Is this hurting our SEO by having two homepage urls, instead of just building a strong presence around one? We have set up canonical meta on each one: On the night version: domain.com/indexnight.html we have a On the day version: domain.com/index.html we have a It seems to me that we should just choose one of them and set up a permanent 301 redirect from one to the other. Any assistance would be greatly appreciated, thank you!

                  Intermediate & Advanced SEO | | JessieT
                  0
                • BeytzNet

                  What happens with a 301 redirected page?

                  Hi All, What happens with an indexed page that I 301 redirect?
                  Is it removed from the Google index after a while? Thanks

                  Intermediate & Advanced SEO | | BeytzNet
                  0
                • Melia

                  How to fix duplicated urls

                  I have an issue with duplicated pages. Should I use cannonical tag and if so, how? Or should change the page titles? This is causing my pages to compete with each other in the SERPs. 'Paradisus All Inclusive Luxury Resorts - Book your stay at Paradisus Resorts' is also used on http://www.paradisus.com/booking-template.php | http://www.paradisus.com/booking-template.php?codigoHotel=5889 line 9 | | http://www.paradisus.com/booking-template.php?codigoHotel=5891 line 9 | | http://www.paradisus.com/booking-template.php?codigoHotel=5910 line 9 | | http://www.paradisus.com/booking-template.php?codigoHotel=5911 line 9 |

                  Intermediate & Advanced SEO | | Melia
                  0
                Moz logo
                • Contact
                • Community
                • Free Trial
                • Terms & Privacy
                • Jobs
                • Help
                • News & Press
                • Mozcon
                © 2021 - 2023 SEOMoz, Inc., a Ziff Davis company. All rights reserved. Moz is a registered trademark of SEOMoz, Inc.

                Looks like your connection to Moz was lost, please wait while we try to reconnect.