What is website crawling?
Website crawling done by a bot which peruses web pages in order to make an index. Search engines do it so they can understand your site, its content, and how to properly funnel it to their users.
Website crawling and indexing are the first two steps of SEO. Search engines crawl your website, then index it, then feed it through an algorithm to deliver it as a result when people search for your service. Weeeeelll… they’ll deliver your site if you’ve done your SEO homework. If search engines don’t know how to categorize you, they can’t recommend your site or they’ll recommend it in the wrong places.
Search engines such as Google, Firefox, and Safari are nothing but oversized, modern card catalogs for the internet. Crawling is the act of surveying sites so search engines can organize them into the system, making sure they’re filed right. There’s no Dewey Decimal System, and the process is rather opaque but with the proper SEO you can make sure website crawlers understand your content the way you want them to.
How website crawling works
A program called a bot (or crawler, or spider) thumbs through your website pulling information from the site’s content. Note – it’s not extracting content, it’s merely perusing your content. HTML is the code used to build websites, and these bots hop from coded content to coded content within your site to interpret your site’s purpose. It looks at your headers (WordPress works with a system of six headers), hyperlinks, and other bits to make its assessment.
Remember when I wrote about Headings & Website Accessibility? Well, it’s not just page readers that use the HTML tags, but search engines use them to crawl your page, as well. They also use JSON-LD, which you can learn about in my post on structured data here.
Website crawlability indicates the ease with which search engines can scan your page and understand its purpose.
If you have a particularly large or complex website, you may consider adding a sitemap. Sitemaps can help you guide crawlers to index your site the way you want it to.
Web Scraping vs. Web Crawling
Scraping a website is not the same as crawling it. As discussed, crawling means to scan the page in order to classify it. It’s basically high-tech reading.
Website scraping is the act of pulling content off a website to use for other purposes. Put bluntly, website scraping is data extraction.
Crawling and the environment
Website crawling is energy-intensive; it takes a lot of electricity. It’s like Las Vegas, running 24/7 with the lights turned up to eleven.
If you have a larger site, you may want to reduce your crawling (and carbon) footprint by telling search engines which pages they should index and which ones they can ignore. There’s a link below to Yoast’s settings for this.
Crawling is the act of surveying sites so search engines can organize them into the system
Crawlability is the ease with which search engines can scan your page and understand its purpose
Scraping is the act of pulling content off a website to use for other purposes. Put bluntly, website scraping is data extraction.