Although no one knows Google’s exact search algorithm, there are certain aspects of your webpage that you can optimize for better website crawlability and indexability. Properly implementing these amendments, in conjunction with a comprehensive SEO strategy, can increase your chances of ranking well in the SERPs and getting closer to that coveted first position.
You can have as many keyword-targeted pages with relevant content as you want, but they won’t be able to do much if they aren’t crawlable. Website crawlability is how well a search engine can access and crawl your site’s content without running into a broken link or dead end. If the bot encounters too many of these or a robots.txt file blocks it, the bot won’t be able to accurately crawl your site. When this is the case, searchers will likely have a hard time finding you.
Indexability, on the other hand, measures Google’s ability to analyze and add your website’s pages to its index. You can see which pages are currently indexed by putting “site:” in front of your URL. This will show you all the pages that are currently in Google’s index. If you see pages missing that you know should be there, take a look at your technical SEO to see if certain things are preventing Google from indexing them.
What Makes a Good Site Structure
One of the biggest things that makes a good site structure is having clear navigation—this makes it easier for Google to both crawl and index your website. Your page hierarchy should go from broad to narrow. After the home page, move on to high-level category pages, sub-category pages, and then individual pages. When you have clear, top-down navigation, it’s easier for both users and crawlers to navigate your website and get to the page they’re looking for.
Common Issues of Crawlability
There are a lot of factors that impact the crawlability of a website, but the two best places to start are with the code and hosting.
If you have code that isn’t crawler friendly, you likely won’t be able to tell Google a clear story about what it is you do. AJAX sites are infamous for having code that bots have issues crawling.
Aspects such as site speed and server errors can cause significant problems for the search bots. If something takes too long and times out, the bots won’t be able to access your page’s content.
Make sure that your URLs are easy to read—a user should be able to remember the page they’re on and search for it again without too much difficulty. If we go back to the navigation hierarchy example, it might look something like this:
- Home: example.com
- High-Level Category Page: example.com/services
- Sub-Category Page: example.com/services/seo
- Individual Page: example.com/services/seo/local
The high-level category page would be an overview of all the services a firm offers. The subcategory page talks about one service (SEO) in general, and the individual page focuses on the specifics of that one service (local SEO). For Ecommerce websites, you would have the main categories, subcategories, and then specific products.
Your navigation should cover every page that you want the search engine to index.
You should have a clearly coded navigation that allows crawlers to follow links quickly and easily. Check for broken links on these pages to ensure the bots can crawl them properly.
Although you should make every effort to have clear navigation, the sitemap.xml files can also help you out. Utilize these to make sure search engines can see all your indexable pages in case your navigation isn’t perfect. Depending on the size and layout of your website, a sitemap.xml or sitemap_index.xml will do the trick. You can also submit these files to Google Search Console, making it easier for bots to crawl and index exactly what you want them to.
Files You Should Have Indexed
Generally speaking, you want pages, images, pdfs, and videos indexed. The content on your pages provides you with the biggest opportunity to rank for relevant keywords in your industry because the crawler is able to interpret it. Bots can’t exactly “see” images and videos, but they can crawl alt text. So, adding alt text that describes the visual can further tell your story.
There are a lot of pages you definitely want Google to crawl and index on your website so you can start to rank for relevant keywords in your industry. However, there are also a handful of things that you don’t want search engines to look at. You can utilize a robots.txt file to instruct bots not to look at certain pages. Common things added to these files include directories, admin/author pages, tags, and shopping cart pages.
What to Watch Out For
There will always be little things here and there to watch out for in terms of website crawlability and indexability, but some of the biggest things to look into are:
Orphaned pages are live pages with content that don’t link to another part of your website. The only way to get to them is by typing in the URL exactly. These files aren’t discoverable unless they’re in the sitemap. Ideally, you will link it from somewhere on your site, so the crawler can follow it to the page easier.
Google cannot index multimedia such as flash, java programs, and audio. If your website relies heavily on these aspects, it’s a good idea to include written content on the page, so bots can at least crawl the HTML portion.
Frames may look great, but they aren’t necessarily the best for SEO. They load your content, images, etc. independently from your actual website. This can cause significant problems for crawlability and indexability since bots often have trouble reading these elements. It can also cause loading issues for your users.
The AJAX script still causes problems for search engines. Whenever possible, use clean coded HTML to make it easier for bots to crawl and index your content.
It’s important to note that these are all best practices—there is no one size fits all solution to crawlability and indexability. However, by having a solid onsite/offsite SEO strategy in place and regularly checking these aspects of your website, you can start working your way up in the SERPs.