SEO 101: How Search Engines Work

The general idea of how search engines work is pretty straightforward - query goes in, results come out. Want to know the best plant milk to buy for your morning coffee? Google will return to you a long list of ecommerce stores, blog posts, reviews, and more to help you make your decision. Have you ever wondered exactly how a search engine puts together those results pages? 

In our previous article we mentioned crawling, indexing, and ranking; these are the three primary functions that search engines are based on. In this article, we’re going to dive even deeper into those three functions, so you have a better understanding of how search engines take all the pages the internet has to offer and find the answers people are looking for.

The Search Engine Process

A lot of work goes into a query before a user ends up with a search engine results page (SERP). In order for Google to provide a list of up-to-date, relevant search results, it needs to first perform three basic functions:

Crawling - The search engine sends bots to trawl through pages around the internet looking for content. It does this by downloading pages, following URLs, and discovering more content.

Indexing - The search engine stores and organizes the content it found. This includes information about key signals such as keywords, type of content, how recent the content is, and user engagement. 

Ranking - The search engine uses an algorithm to determine what the most relevant content it has in its index to answer a search query.

The process then goes a little something like this: A user inputs a query into Google. The search engine algorithm then goes through its index and combines this along with factors like the user’s location, device used, language, and previous search history. The search engine results page is then displayed ranked according to relevance. 

Everything you do in Search Engine Optimization relies on influencing these three functions. You want to optimize page content on your store so that Google’s crawlers find it, index it accordingly, and rank them favorably for SERPs. This is why it’s important to first understand how it all works before you start building your SEO strategy.

Now that we can see how these fit in together, let’s get into the details. We’ll primarily focus on Google’s various programmes and algorithms for discussion, as it is the search engine market leader by a long mile encompassing over 70% of desktop and over 90% of mobile searches.

Crawling

It all begins with a bit of internet sleuthing. Crawling is the process by which search engines discover content which includes webpages, videos, images, documents, etc. Search engines send crawlers to find new page content and updated content by finding links, downloading pages, and discovering new links on those pages to follow. 

How crawlers find your content 

We’ll go into this topic in greater detail in a future article, but so you’re familiar with what it means we’re going to talk about a file called Robots.txt which helps direct crawlers to important content. We’ve just covered that crawlers - also known as bots, or Googlebot - are sent out to retrieve pages and discover new links and therefore more content. It’s easy to assume then that those crawlers are discovering all of your content, but without a little guidance they might miss the most important pages. There exists plenty of content on a Shopify store you might want to direct crawlers away from such as duplicate URLs when you’re setting up sort-and-filter parameters for your store, pages with outdated or unimportant content, or hidden pages for when you’re testing. In order to hide that content from crawlers, you can use a Robots.txt file on your store. In short, this file is stored in the root directory of your store and tells the crawlers which content they should and shouldn’t crawl. This helps to prioritize content, as when a Googlebot finds this file it will read it and follow the suggestions you’ve set out for crawling. Without one, it will simply proceed to crawl the entire store. 

The importance of site navigation

Site navigation is an important factor in the user experience of your store, and it’s equally important when it comes to technical SEO. Crawlers jump from page to page using links, and this means the navigational structure of your store is crucial to them finding your content and indexing it. If that path isn’t clear then they can’t crawl, they can’t index, and your store is pretty much invisible to search engines - and we definitely don’t want that. Ensure all important pages are linked in your main navigation, and that you’re linking back to relevant pages on your site. Equally, if you’re redirecting any pages for whatever reason you need to update the link used in the site navigation to the new link rather than allowing it to be used as a redirect link from the old page. Redirects slow down the crawling process, making it much less efficient and less likely to discover new content.

Why your page might not appear in search results

It’s understandable to think if the page is there, the crawlers will find it and it’ll start popping up in search results. However it’s a little more complicated than that, and if you find that some of your content isn’t appearing there are a few reasons why this might be the case. Search engines might provide instant results when you enter a search query but with so many pages to crawl and index and retrieve, it won’t always be up-to-date. If your site is brand new then there’s a good chance it simply hasn’t been crawled yet, and if that hasn’t happened then it won’t appear on SERPs. If this isn’t the case then it may have something to do with your site’s navigation making it difficult to crawl - as we’ve just covered, this is vital. Sites can also be flagged for utilizing spam tactics such as keyword stuffing, which means Google is less likely to display your page and even may go so far as to penalize it.

Indexing

Once the crawlers are finished discovering content, it all gets stored in a massive databank called an index. All the pages that have been crawled are analyzed and the content is extracted to highlight keywords, the type of content on the page, etc., then filed away ready to be drawn from by the ranking algorithm when a user enters a search query. Think of it like a massive, super-efficient encyclopedia of everything on the internet. 

Google’s inverted index

Having to go through a massive index like Google has would take a long time every time someone enters a search term, so they came up with a slightly different method of storing and retrieving information called Caffeine - an inverted index. What this means is that Google stores all the words in every page it indexes along with pointers to where in the page those words are. They then use a process called tokenization which breaks words down to their core meaning, and stores and retrieves them based on this making that process faster and more efficient. In other words, rather than Google going through each page and finding the keywords in that page that match the search query, it starts by looking at the keyword and which pages contain that word. 

Cached pages

You might be wondering how search engines store so many webpages and their content when sites are constantly updating - a new product description here, a fresh blog there. The answer is caching. Google stores a cached version of a webpage in its index, that being a sort of snapshot of that page the last time it was crawled. How often a page is crawled relies on a range of factors including how frequently they’re updated and how established the site is. A well-known, frequently updated news site will be crawled more often than a personal blog that’s only updated a couple of times a month, for example. If you want to view the most recent cache of a page, you can do so by searching for the page and clicking the arrow beside the main link. 

This will then display the most recent version of the page Google has stored in its index, along with when it was last cached.

Ranking

This is the big important function that we’re concerned with when it comes to SEO. Once the crawlers have found content and Google has indexed it, it comes time to rank all that content for the search terms it’s relevant to. The process of ranking all that content ensures that search engines show only the most relevant results to users when they enter a query. Search engines use algorithms that consider a number of factors to determine ranking, and this is subject to many tiny changes over time. These changes are designed to improve the user experience and give more accurate results. 

Google RankBrain and BERT

Google is always developing new ways to better the search experience for users, and artificial intelligence has been one of their biggest advancements. We briefly covered RankBrain and BERT in a previous article, however given how important they are to Google’s ranking algorithm we reckon it’s worth going over them again. Released in 2015, RankBrain is Google’s machine learning program that improves its search performance over time by learning constantly from every search made. For example, if RankBrain notices a lower ranked link performing better for a search term it will bump it up in the SERP ranking as it has learned that the link in question satisfies that search term more effectively. BERT on the other hand is Google’s AI program that seeks to understand and better interpret the nuances of language used in search terms. It aims to contextualize words used in a search term based on the words around them with a view to delivering more specific results. For some specific examples of how this works in comparison to before BERT’s implementation in 2019, see this post from Google. The two AI’s work in tandem to determine what results are the most valuable to satisfy different queries, so they are important to understand when we talk about ranking.

Links, Content, Engagement

There are many tricks and tactics SEO experts use to boost the ranking of a page, but three of the most important factors are links, content, and engagement.

Links are not only what crawlers use to find your store, they’re also important when it comes to Google’s algorithm deciding where to rank a page. Backlinks - also known as inbound links - are links from external sites that direct to your site. Internal links are those which exist on your site that direct to elsewhere on your site. Backlinks are vital to building authority or trust with Google and other search engines; the more external sites that link to a page, the more a search engine sees that page as worthwhile. Think of it like when you get a review for your Shopify store - a person unrelated to your business thinks the product is great, and this recommendation means more to a potential customer than you saying your product is great. Equally if the source of the backlink is also seen as high-quality and trustworthy to search engines, then it’s more likely to give the link authority compared to a poor quality source. Google’s PageRank is a core component of its ranking algorithm which determines the value of a page by measuring the number of quality backlinks that point to it. The thinking behind this is that the more quality backlinks a page has, the more trustworthy and relevant it is.  

When you follow a link suggested by Google, you’re likely hoping for the page you land on to have some really great content. Page content includes everything on the page that the user is going to consume, so obviously that includes text but also images, video, documents etc. This content helps search engines to determine whether or not your page is worth ranking for different search terms. Your page content should focus on the user’s intent - why would they want to view your store? What search queries might they need answers for? This focus on satisfying user intent is what search engine ranking focuses and the reason why programs such as RankBrain exist. 

And finally let’s talk about engagement. Some of the metrics we focus on when it comes to page engagement are how many times that page has been visited (clicks), the amount of time a user spends on that page, and the percentage of users who visited one page and left the site (bounce rate).  You could have amazing content, however if no one is visiting or engaging with your page then it’s highly unlikely that Google will rank it favorably for the search terms your keywords and content focuses on. Tracking these metrics yourself for the landing pages you want to rank will help you to determine if that page is optimized for SEO and if not what you can change to help it rank better.

----

Search engines are always updating their algorithms for how they discover, store, and display content to users, which can be tricky to keep on top of. However the underlying principles of crawling, indexing, and ranking remain fairly consistent. If you’re able to understand these three functions and how they relate to your Shopify store, then you can develop a solid, well-rounded SEO strategy.