Crawl Budget Optimisation

crawl-budget

What is Crawl Budget?

web-crawler

 

Crawl Budget is the number of pages search engines allocate to crawl different websites.   The number of pages Google crawls depends on the authority of your site.  In an interview with Eric Enge, when asked about Crawl Budget Matt Cutts said:
“… the number of pages that we crawl is roughly proportional to your PageRank”

Does your website have a crawl budget issue?

 

question

 

Use this list below to find out if you website has a crawl budget issue:

  1. Use your XML Sitemap to determine how many URLs you have on your site
  2. Go to Crawl > Crawl Stats in Google Search Console and record the average pages crawled per day.
  3. Divide the number of pages by the average pages crawled per day.
  4. If you end up with a number higher than around 10 you should optimise your crawl budget.

 

How to increase your website’s crawl budget

There are at least 4 key ways to increase your crawl budget:

  1. Reduce Status Code Errors: 4XXs, 5XXs
  2. Block access in Robots.txt to unnecessary pages
  3. Eliminate Redirect Chains
  4. Link Acquisition

 

1. Reduce Status Code Errors

4XX and 5XX errors waste Googlebot and other crawlers time.  They are dead ends and can be easily identified by using a crawler such as Botify, Screaming Frog or Google Search Console.

 

2. Block access in Robots.txt to unnecessary pages

Robots.txt can be used to discourage search engines from crawling certain sections of your site.  Therefore it is a useful way to signpost to Googlebot or other search engine bots what exactly should be crawled on your website.

 

3. Eliminate Redirect Chains

Redirect chains can be frustrating for Googlebot.  They are not always followed immediately and can take a long time to crawl.  Make sure you try to keep redirect chains to an absolute minimum.  Tools such as Botify are useful to identify these types of problems site-wide, or the Ayima Redirect path chrome plugin is useful for ad hoc on page checks.

 

4. Link Acquisition 

Link Acquisition, Link earning, Link Building: Call it what you like but it is still an essential part of SEO.  Heading back to Matt Cutts’ original quote, a strategy that aims to increasing high quality links to your site is the long route to increasing your crawl budget over time.

 

Key Factors Affecting Crawl Budget

Google have stated in a recent blog that having a large number of low value URLs can negatively affect a sites crawling and indexing.  If anything it seems sometimes the fewer higher value content pages you have on your site the more it is valued.  Examples of this are sites like Backlinko.com that currently only has 78 pages indexed.

Google actually go as far as to define the categories of what they consider to be a low value URL:

 

  1. Faceted Navigation and Session Identifiers – Although Faceted Navigation is often useful for users on e-commerce sites to narrow down their search it can create a labyrinth for Google Bot to navigate.  Faceted Navigation can also often cause duplicated content if not correctly implemented.
  2. On-Site Duplicate Content
  3. Soft Error Pages – Soft 404s can easily be identified in Google  Search Console in the crawl errors section.  Soft 404s can limit a site’s crawl coverage be
  4. Hacked Pages
  5. Infinite spaces
  6. Low Quality Spam Content 

 

Essentially it seems to boil down  making sure you are not wasting Google bot’s time, which completely makes sense from Google’s perspective.

How do I understand what pages Googlebot is crawling on my website?

Server log files: Once you have access to the server log files, you can analyse them using a program such as Screaming Frog’s Log File Analyser.

Or, you can simply check in Google Search Console.

 

Crawl Budget Common Questions

  1. Q: Is it possible to control crawling with the crawl delay command in robots.txt? A: No, this is not followed by Google bot.
  2. Q: Can Nofollow links affect crawl budget? A: Google say that any link that is crawled affects crawl budget.
  3. Q: Do redirect chains affect crawl budget?  A: Yes, redirect chains are likely to have a negative affect on crawl budget.
  4. Q: Do AMP pages, Hreflang and embedded content affect crawl budget? A: These pages will consume crawl budget.

Watch the Crawl Budget Video by Neil Patel and Eric Siu for more information on Crawl Budget

Crawl Budget Optimisation
5 (100%) 1 vote

About the Author

Chris

Chris is a London SEO Consultant working as an SEO Account Director for Blue 449, part of Publicis Groupe.

Leave a Reply

Your email address will not be published. Required fields are marked *