Whenever I think of Googlebot, I see a cute, smart Wall-E like robot speeding off on a quest to find and index knowledge in all corners of yet unknown worlds. It’s always slightly disappointing to be reminded that Googlebot is ‘only’ a computer program written by Google that crawls the web and adds pages to its index. Here, I’ll introduce you to the crawler and show you what it does.
Googlebot? Web crawler? Spider? Huh?
All those terms mean the same thing: it’s a bot that crawls the web. Googlebot crawls web pages via links. It finds and reads new and updated content and suggests what should be added to the index. The index, of course, is Google’s brain. This is where all the knowledge resides. Google uses a ton of computers to send their crawlers to every nook and cranny of the web to find these pages and to see what’s on them. Googlebot is Google’s web crawler or robot and other search engines have their own.
How does Googlebot work?
Googlebot uses sitemaps and databases of links discovered during previous crawls to determine where to go next. Whenever the crawler finds new links on a site, it adds them to the list of pages to visit next. If Googlebot finds changes in the links or broken links, it will make a note of that so the index can be updated. The program determines how often it will crawl pages. To make sure Googlebot can correctly index your site, you need to check its crawlability. If your site is available to crawlers they come around often.
There are several different robots. For instance, the AdSense and AdsBot check ad quality, while Mobile Apps Android checks Android apps. For us, these are most important ones:
How Googlebot visits your site
To find out how often Googlebot visits your site and what it does there, you can dive into your log files or open the Crawl section of Google Search Console. If you want to do really advanced stuff to optimize the crawl performance of your site, you can use tools like Kibana or the SEO Log File Analyser by Screaming Frog.
Google does not share lists of IP addresses that the various Googlebots use, since these addresses change often. To find out if a real Googlebot visits your site, you can do a reverse IP lookup. Spammer or fakers can easily spoof a user-agent name, but not an IP address. Here’s Google’s example of verifying the validity of a Googlebot.
You can use the robots.txt to determine how Googlebot visits – parts of – your site. Watch out though, if you do this the wrong way, you might stop Googlebot from coming altogether. This will take your site out of the index. There are better ways to prevent your site from being indexed.
Google Search Console
Search Console is one of the most important tools to check the crawlability of your site. There, you can verify how Googlebot sees your site. You’ll also get a list of crawl errors for your to fix. In Search Console, you can also ask Googlebot to recrawl your site.
Optimize for Googlebot
Getting Googlebot to crawl your site faster is a fairly technical process that boils down to bringing down the technical barriers that prohibit the crawler from accessing your site properly. It is a fairly technical process, but you should make yourself familiar with that. If Google can’t crawl your site perfectly well, it can never make it rank for you. Find those errors and fix them!
Googlebot is the little robot that visits your site. If you’ve made technically sound choices for your site, it’ll come often. If you regularly add fresh content it’ll come around more often. Sometimes, whenever you’ve made large-scale changes to your site, you might have to call that cute little crawler to come at once, so the changes can be reflected in the search results as soon as possible.