Home/Blog/Enterprise SEO
Enterprise SEO

Crawl Budget Optimization: Get More Pages Indexed on Large Sites

May 12, 20269 min readUpdated May 12, 2026
Quick Answer

Crawl budget is the number of pages Googlebot will fetch from your site within a given time window. For sites under 10,000 pages it rarely matters; for larger sites it caps how much can ever be indexed. The two levers are crawl capacity (server speed, error rate) and crawl demand (page popularity, freshness). The Google Indexing API bypasses crawl demand entirely by forcing priority crawls on nominated URLs.

Crawl budget math

Google publishes the framework: crawl budget = crawl capacity × crawl demand. Crawl capacity is what your server can handle; crawl demand is what Google wants to fetch. Both are dynamic and you can influence both.

For a 1,000-page site, Google's typical crawl rate is high enough that crawl budget never becomes the bottleneck. For a 100,000-page site, you'll see Googlebot fetch maybe 30,000 URLs per day at peak — meaning a third of your site gets refreshed daily. For a 10-million-page site, that fraction shrinks to single digits.

When crawl budget actually matters

Diagnose: GSC Crawl Stats report

Open Search Console → Settings → Crawl Stats. You'll see three charts: total requests, total download size, and average response time. Healthy patterns:

HEADS UP
If you see crawl requests dropping while pages are being added, that's the alarm. Either your server is throttling, your robots.txt blocks expanded, or Google has decided the new content isn't worth more crawl.

The four crawl-waste patterns

1. Faceted navigation

/products?color=red&size=large&sort=price-asc generates thousands of URL combinations, most of which are near-duplicates. Each one consumes crawl budget and dilutes ranking signals.

Fix: canonical the facet URLs back to the unfaceted version. For high-value facets you want indexed (e.g., /products/red-shoes), create static URLs with proper internal linking. Block low-value facet combinations with robots.txt.

2. URL parameters

Tracking parameters (?utm_source, ?ref, ?sessionId) create unlimited URL variants of the same page. Handle them with canonicals and avoid emitting tracked URLs in your own internal links.

3. Internal duplicates

Print versions, dev/staging URLs accidentally exposed, paginated archives where each page is too thin to stand alone. Canonical to the primary version, robots-disallow the rest, or consolidate.

4. Long redirect chains

Every hop in a redirect chain is a separate fetch. /a → /b → /c → /d burns 4 crawl-budget units for one final destination. Flatten redirects to single hops.

Server performance optimization

Crawl capacity scales with server speed. Real improvements you can make this week:

Sitemap segmentation for large sites

One mega-sitemap with 50,000 URLs is hard for Google to prioritize. Split into segmented sitemaps with a sitemap index:

xml
<sitemapindex>
  <sitemap>
    <loc>https://yoursite.com/sitemap-products.xml</loc>
    <lastmod>2026-05-12</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://yoursite.com/sitemap-categories.xml</loc>
    <lastmod>2026-05-12</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://yoursite.com/sitemap-blog.xml</loc>
    <lastmod>2026-05-12</lastmod>
  </sitemap>
</sitemapindex>

Each segment can have its own lastmod, which Google uses to prioritize re-crawling.

Force-index priority URLs via Indexing API

The strategic move on large sites: stop trying to make Google crawl your entire site faster. Instead, pick your highest-value URLs and push them through the Indexing API. Examples:

This is where Instant URL Indexer's bulk submit shines for enterprise sites: 500 URLs per request, integrate it into your CMS publish hook, and your priority pages bypass the crawl-budget bottleneck entirely.

NOTE
Enterprise tip: send the API a URL only on real content changes, not on every CMS save. Otherwise you'll burn through credits on cosmetic edits and dilute the priority signal.

Monitoring crawl budget over time

Set up weekly monitoring on:

Frequently Asked Questions

How do I know if crawl budget is my problem?+

Run a site:yoursite.com query and compare the result count to your actual URL count. If the gap is large and persistent on a large site, crawl budget is likely the cause. Also check GSC Coverage report — a high count of "Discovered – currently not indexed" URLs points directly at crawl budget exhaustion.

Can the Indexing API replace crawl-budget optimization?+

For priority URLs, yes. For the long tail, no. You can't push 100,000 URLs through the API monthly. Use the API for high-value pages and fix crawl-budget waste so Google can crawl the rest naturally.

Does blocking pages in robots.txt save crawl budget?+

Yes, but with a catch: if a blocked URL has external backlinks, Google may still index the URL itself (without snippet) and consume crawl attempts checking robots.txt. Better to noindex + internal-link-discourage than blanket-block.

How fast is the Indexing API for a 100k URL site?+

Each submission is processed in 30–90 seconds regardless of site size. The bottleneck for large sites is the daily quota (200 URLs/day default for the direct API; 500 per submit with multiple submits/day for Instant URL Indexer).

Index any URL in under 1 minute.

500 URLs per submission. REST API on every plan. Track every URL end-to-end.

Keep reading