A top area of focus for many SEOs in 2017. This blog post will use a healthy dose of personification to address the components that make up GoogleBot’s crawl budget, highlight a few examples of pitfalls, and suggest technical fixes to get Googlebot crawling your entire site in no time.
Crawl Budget Formula
It’s not every day Google lets you look into its black box. Since I put my SEO hat on back in 2012, specific detailed input from Google on how search works has been few and far between. Like most high level strategies, optimizing crawl budgets have almost always been a nebulous process propped up by independent SEOs and their agency’s research. That is until Gary Ilyes was lovely enough to spell it out for us with one simple formula:
Crawl budget = crawl demand + crawl rate
Where crawl rate refers to how easily Googlebot can navigate your site and crawl demand is Googlebot’s assessment of your site’s popularity.
Crawl Demand: Is Your Site Engaging?
The two Google defined factors here are popularity and staleness. This is Googlebot’s attempt to find the perfect crawling rate for super engaging sites like amazon.com as well as less visited, but still indexed, sites like carlbarnes.com.
Depending on your service agreement with clients, these aspects could be something you have the least control over. However, it is important to make sure your content not only targets, but also provides value to a specified audience. At the very minimum, having good basic SEO and killer content will get you success. Assuming your client has the means and desire to work with you to give potential customers what they’re looking for in a site… Let’s move on to the more technical aspect, crawl rate.
Crawl Rate: Is Your Site Accessible?
Google / Gary Iyles explicitly defines crawl rate as:
The number of simultaneous parallel connections Googlebot may use to crawl the site, as well as the time it has to wait between the fetches.
In other words, Googlebot decides how many pages it wants to crawl as it approaches your site. Once navigating, crawl limit can increase if it has an easy time understanding your content. So how do we make things easy on Googlebot? Remove the common crawling pitfalls related to low-value-add URLs.
Remove Low-Value-Add URLs
Specifically those related to faceted navigation, URL parameters, and on-site duplicate content.
Faceted navigation refers to how your site generates URLs in regards to Googlebot’s navigation between pages. An ideal journey features uniquely generated pages in a logical hierarchy.
As we all know, this isn’t often the case with larger sites. One of the biggest problems is allowing your site to convert session-generated values into (most likely infinite) URL parameters that are crawlable and indexable, but not useful in search results. For instance, session IDs or product sort filters.
The best option to fix these faceted navigation problems would be paring down options where you can, and ensuring you’re blocking the rest as best you can via your sitemap, page robots metas, and robots.txt file.
Another common aspect of faceted navigation, URL parameter management gets to the heart of having one page accessible by only one URL. Regardless of navigation. No Duplicates.
A site with extraneous URL parameters ensures duplicate content, causing Googlebot to drastically cut your crawl limit. Common fixes include, getting with your developers to remove any unnecessary parameters and writing logic that ensures only one version of the page’s URL will be shown to Googlebot upon request.
While a lot of these fixes require both someone with SEO knowledge and someone who can / wants to develop the necessary changes. Sometimes it’s a quick process, most times it’s… well longer. Yet here’s what you can do in the short term.
What Can SEOs Independently Do Today?
Source and remove on-page duplicate content:
You always have to account for actual pages of duplicate content on your site! Sourcing and cleaning can be as simple as clicking through your site and noting similar pages, another common pitfall is having multiple pages with a small amount of unique content. For instance, what we see below:
Potential fixes include deleting low-value-add pages, adding more unique content to each page, or using on-page directives (prev/next, canonical links, et cetera).
Get into your Crawl Error Report in Google Search Console:
Use this report to root around and start diagnosing problems. These are the URLs that Googlebot knows are causing issues. Sourcing and fixing prioritized site errors in the code or with redirects can only improve your standing in Googlebot’s eyes. Speaking of redirects…
If you’ve ever migrated any content on your site, setup global redirects, or have ever crawled your site with SEO tools. Odds are you are aware of redirect chains. It’s another common pitfall that just takes a bit of elbow grease to clean up.