TABLE OF CONTENTS
Web Scraping & Data Extraction with Screaming Frog
What is Web Scraping?
Web Scraping also known as Web Data Extraction, or screen scraping, is used to extract large amounts of data from websites. The data can then be extracted into spreadsheets, or databases for further analysis.
Why do I need web scraping for SEO?
- Content Idea Inspiration and Research
- Understanding Competitors Content Strategy
- Creating alt text entries for 1000s of images quickly
- Collect plain text
- Google Analytics IDs
- Schema Markup
- Social Meta Tags (Open Graph Tags, Twitter Cards)
- Mobile Annotations
- Comment Scraping
- Email Scraping
- Hreflang code
- Prices of Products
- Stock Availability
A Beginners Guide to Web Scraping with Screaming Frog
Web Scraping with Screaming Frog SEO Spider is one of the less used features of Screaming Frog, but certainly a useful trick to have up your sleeve when you need to extract large amounts of data from the HTML of a webpage.
Screaming Frog is by no means the only tool that you can use for web scraping, (Python is generally considered the go-to solution). But for beginners to web scraping Screaming Frog provides all the features you need to allow you to extract using CSS Path, Xpath and regex.
The Three Methods of web scraping with Screaming Frog are:
- XPath – This option allows you to scrape data using Xpath selectors. Recommended for most web scraping scenarios.
Extracting with Xpath
- CSS Path – CSS selectors are patterns used to select elements and allows you to scrape data quickly. Recommended for most web scraping scenarios.
How to web scrape with Screaming Frog
- Click on: Configuration > Custom > Extraction. This will open up a new extractor page, which will have 10 separate inactive extractors.
- Inspect an element on a webpage (On Chrome click on ‘Inspect Element) and find the specific data that you want to pull: Select either a CSS Path, XPath or you can use Regex. (These are the three methods for webscraping that Screaming Frog accepts).
- Input the Syntax into the relevant fields on the extractor page.
- If your Syntax is valid, then a green tick will appear next to the input fields.
- Close the extractor page and go back to the main Screaming Frog page, enter the URL of the website that you want to scrape the data from and click on Start.
- Once Screaming Frog has completed you will be able to view your data under the Custom tab and Extraction Filter.
- Export the data into Excel.