Being Freed
Being Freed
Being Freed

Website Crawling

In website crawling an app called a bot thumbs through your site to understand your purpose & content so the engine can index it right.
Website crawling -- a graphic depicting a Pac-man board

What is website crawling?

Website crawling done by a bot which peruses web pages in order to make an index.  Search engines do it so they can understand your site, its content, and how to properly funnel it to their users.

Website crawling and indexing are the first two steps of SEO.  Search engines crawl your website, then index it, then feed it through an algorithm to deliver it as a result when people search for your service.  Weeeeelll… they’ll deliver your site if you’ve done your SEO homework.  If search engines don’t know how to categorize you, they can’t recommend your site or they’ll recommend it in the wrong places. 

Search engines such as Google, Firefox, and Safari are nothing but oversized, modern card catalogs for the internet.  Crawling is the act of surveying sites so search engines can organize them into the system, making sure they’re filed right.  There’s no Dewey Decimal System, and the process is rather opaque but with the proper SEO you can make sure website crawlers understand your content the way you want them to.

How website crawling works

A program called a bot (or crawler, or spider) thumbs through your website pulling information from the site’s content.  Note – it’s not extracting content, it’s merely perusing your content.  HTML is the code used to build websites, and these bots hop from coded content to coded content within your site to interpret your site’s purpose.  It looks at your headers (WordPress works with a system of six headers), hyperlinks, and other bits to make its assessment. 

Remember when I wrote about Headings & Website Accessibility?  Well, it’s not just page readers that use the HTML tags, but search engines use them to crawl your page, as well.  They also use JSON-LD, which you can learn about in my post on structured data here.

Website crawlability

Website crawlability indicates the ease with which search engines can scan your page and understand its purpose. 

If you have a particularly large or complex website, you may consider adding a sitemap.  Sitemaps can help you guide crawlers to index your site the way you want it to.

Web Scraping vs. Web Crawling

Scraping a website is not the same as crawling it.  As discussed, crawling means to scan the page in order to classify it.  It’s basically high-tech reading.

Website scraping is the act of pulling content off a website to use for other purposes.  Put bluntly, website scraping is data extraction. 

A graphic with paw prints running through it and the words, "Website crawling - the act of surveying web pages in order to index them".
A graphic with a snow shove and the words, "Website scraping - the act of scanning an lifting copy from web pages".

Crawling and the environment

Website crawling is energy-intensive; it takes a lot of electricity.  It’s like Las Vegas, running 24/7 with the lights turned up to eleven.  

If you have a larger site, you may want to reduce your crawling (and carbon) footprint by telling search engines which pages they should index and which ones they can ignore.  There’s a link below to Yoast’s settings for this.

Vocabulary

Crawling is the act of surveying sites so search engines can organize them into the system

Crawlability is the ease with which search engines can scan your page and understand its purpose

Scraping is the act of pulling content off a website to use for other purposes.  Put bluntly, website scraping is data extraction. 

Like my posts? Toss me a tip!

digital nomad hiker drysuit diver queer woman sober ukulele player scorpio Tough Mudder goofball

Freed scuba diving with her ukulele

All content © by Freed

Want to know more about what I do?  When I have a new post?  Where I’ve been lately?  (I do get around.)  Sign up for my mailing list below.  I promise to not fill your inbox with junk.