What is a Log File?

A Log File contains the records of hits from different user agents that the server receives.  The data in the log file contains data such as the time the hit or request to the server was made, the IP address, the URL requested and the user-agent used.

Why is Log file Analysis useful?

Log File analysis is one of the best ways to understand exactly how different user agents are crawling your site.  Log File analysis is particularly useful when it comes to understanding how much Crawl Budget is being wasted, and crucially, what URLs this is on.  Any accessibility errors will also be illuminated during log file analysis as well as any other crawl deficiencies.  For example, if you suspect Googlebot is ignoring some pages where thin content is potentially an issue, but you need to prove this to your client, this is where Log File analysis can be used to categorically prove this through our old friend: Data.


For the purposes of SEO, we will mainly be filtering the Googlebot User-Agent.  (Other User-Agents are available).  To perform a decent analysis you are going to need around 60-120,000 rows of Excel data.

The Anatomy of a Log File

Now you know a little about the basics of log file analysis, it is now time to show you the different sections of a Log File.  Log Files are consistent in the fact that they will almost always include the following:

  1. Server IP
  2. Date and Time
  3. Method (GET / POST)
  4. Request URI
  5. HTTP status Code
  6. The User Agent


Log Files may contain other information, such as the host name, client IP address or the bytes downloaded.

Crawl Budget: What is it exactly?

Crawl budget is the allocation of pages given to your website by a search engine (such as Google) each time it visits your site.

How is Crawl Budget determined?

Crawl budget and allocation is based on the authority of your site.  In the olden days this was determined in part by PageRank.  Essentially, the more authority your site has, the more URLs will potentially be crawled.

Even if your site has a large Crawl Budget, Google may choose to ignore certain sections of your website if it sees that you are producing content that is thin, or low quality on a large scale.

Log File Analysis Tools: Screaming Frog Log File Analyser & Splunk

The Screaming Frog Log File Analyser is one of the most user friendly ways to analyse your Log Files.  As this product is made specifically for SEOs it is the one I recommend above all others, although if you are looking for other alternatives there is always Splunk, which is also reasonably user friendly.  (Although the option to simply drag and drop a server file from any server such as Apache, ISS or NGINX on to the actual tool for fast analysis is not an option).

Screaming Frog Log File Analyser


Image: A screenshot from the recently launched Screaming Frog Log File Analyser

How to Merge Log Files

When you download your Log Files you may get multiple entries.  To merge these files simply put all the files into one folder and fire up the command line.

  • Click ‘run’ on the start bar.
  • Type cmd and press return.
  • Type in cd Desktop/your folder name
  • To merge multiple Log Files to CSV type the following:  *.log mergedlogs.csv

Log File Analysis – Key Objectives

  • Find out which pages on your site are being crawled the most. Are these the pages you want crawled the most?
  • Find out which pages on your site are not being crawled at all, and are ‘orphan’ pages.
  • Discover if your XML Sitemap URLs are being crawled
  • See if your news sitemap being checked by Googlebot
  • Find out if paginated pages being crawled, compared with your core category pages
  • Assess the impact of new inbound links on the crawl rate
  • Discover how quickly a newly launched site or site section is being crawled
  • See if crawlers are spending large amounts of time crawling URLs that add no actual SEO value

Useful Links

https://builtvisible.com/log-file-analysis/ – Very thorough and useful resource for log file analysis.

Published by Chris

Chris is a London SEO Consultant working as an SEO Account Director for Blue 449, part of Publicis Groupe.

Leave a comment

Your email address will not be published. Required fields are marked *