Google API leak: What happened and what does it mean for ecommerce?
For as long as Google and the practice of SEO have existed, amateurs and experts alike have tried to work out the mysteries behind Google’s algorithm. Over the years we’ve been given clues and hints, as well as advice on how to best prepare content for ranking. However, it’s rare that something spills out into the public domain that wasn’t supposed to be.
That’s why the world of SEO has recently been lit up with discussion and debate over a major leak from Google. Does the information contained in the leaks align with or contradict what we thought we knew about Google and page ranking? Do we have any more insights into what we should be doing to appeal to Google’s algorithm?
Today we’re going to take a quick look at what exactly happened with the Google API leak, and what it might mean for ecommerce SEO.
Google API Leak: What happened exactly?
Back in mid-March, thousands of documents appeared on GitHub, added by an automated bot. These appeared to be from Google’s Content API warehouse. By early May, these documents were shared with SparkToro co-founder, Rand Fishkin, with anonymous sources and Google ex-employees confirming their authenticity. Some of the contents of the document went against some things Google had said in the past, and generally gave the largest amount of confirmed insights into the algorithm.
It is important to note, however, that despite the authenticity of the documents it is possible that some of the contents of the leak aren’t currently relevant. Fishkin has cautioned that:
“I would urge…not to point to a particular API feature in this leak and say: “SEE! That’s proof Google uses XYZ in their rankings. It’s not quite proof. It’s a strong indication, stronger than patent applications or public statements from Googlers, but still no guarantee.”
That being said, it is still the closest thing we’ve ever had to real, tangible evidence about ranking algorithms. Up until the leak, most of what we knew about Google’s ranking algorithm came from clues Google offered, or experimentation and speculation on the part of SEO experts. After analyzing over 2500 documents, Fishkin along with iPullRank CEO, Michael King, we now have real evidence direct from Google themselves. It’s certainly one of the biggest events in SEO history to date, but what do we now know?
What are the highlights from the Google API leak?
With over 2500 pages to analyze, there’s a lot to learn from the Google API leak. Rand Fishkin and Michael King have done a lot of that initial deep diving into the documentation and have come through with some interesting highlights and insights. Although, there will be other SEOs with perhaps more technical expertise who may come forward with even more interesting insights as time goes on. For now, here are some of the major highlights from Fishkin and King.
Chrome browser clickstreams power Google Search
Fishkin’s source claims that as early as 2005, Google wanted the full clickstream of the many millions and billions of internet users, and this was made possible by the Chrome browser. In the API leak, it’s suggested that Google calculates many different metrics using Chrome views related to individual pages and full domains. Essentially, Google uses clicks in Chrome browsers to help determine ranking and other factors on SERPs.
One example Fishkin points to is the creation of sitelinks. These are the links which appear to other pages below the main search result:
It’s likely that Google uses the number of clicks on pages in Chrome to determine the most important URLs. Essentially, they use that click data from Chrome users to calculate which links to include in the sitelinks snippet.
NavBoost, Clicks, and Quality
NavBoost is a ranking factor that was discussed during Google’s DOJ antitrust trial that started in September 2023 and concluded back in May. It’s a ranking factor that improves results for navigational queries, using different signals to determine the most relevant results. The other side of the coin is called Glue, which determines all the other elements on SERPs. NavBoost then indicates the importance of navigation and user friendliness in building out webpages.
What we’ve now learned from the leaks is how NavBoost relates to how Google measures clicks. We’ve known for a while now that Google measures clicks, but now we know the terms they’re using and have some insight into how they’re using them as a metric. We’ve got “goodClicks”, “badClicks”, and “lastLongestClicks” for example. Google measures the length of clicks, and this allows them to categorize and filter based on these. For example if a user clicks on a link and immediately goes back to the SERP, that’s considered a bad click because they were clearly unsatisfied with the result. It also measures things like how long it has been since a user last clicked on a webpage and spent time on it over a span or around 13 months.
Google also uses clicks to determine the weight of a link. There are three tiers for links - low, medium, and high quality - and Google will use click data to choose a category. For example, if Page 1 has no clicks it’s categorized as low quality and is simply ignored. On the other hand, if Page 2 has high click volume using click data from Chrome it will be indexed as high quality. It’s important to note that Page 1 wouldn’t lead to the entire domain being harmed in any way, that low quality page will be ignored.
When present, navigational user intent is more important than content and links
What the API leaks have taught us so far is that Google takes navigation and user intent very seriously and could in fact be more important than content. Google quickly learns based on clicks which results search users want to see in a given area for specific terms. For example, if there are lots of people in Vancouver searching for “snow boots” and they click on pages that are much further down the SERP, Google will take notice and rank it higher. It’s unlikely certain pages will rank higher, even if they’ve more authority or are a larger site. They’ll note the kind of pages and sites that these users in that area are clicking on and the quality of those clicks, and adjust ranking accordingly.
This is essentially the power of NavBoost we mentioned earlier. It puts location-based search and user intent at the top of the SEO to-do list.
What does the leak mean for ecommerce SEO?
It’s difficult to say at the moment what exactly the leak means for specific spaces online, including ecommerce. In all likelihood, the full implications of what was found will still be analyzed with new insights gleaned for months to come. What we do know, however, is how we can apply what has been found so far to how we approach ecommerce SEO.
Focus on building your brand, and the SEO returns will come later
Google is able to identify entities, then sort, rank, and filter them for SERPs. Entities are things like brands, and part of that entity includes their brand name, official websites and pages, social media and so on. Basically, a brand’s digital footprint - how well known they are, and their reach outside of search. Of course if you’re a large, international brand this is going to mean it’ll be easier to rank in SERPs. And it means the returns on SEO won’t come quite so quick.
So in order to improve your SEO, you need to focus on brand building. Creating, fostering, and promoting your brand so that it becomes notable within your space is invaluable. Doing so outside of the context of search will also help.
Big brands will dominate, so find your niche
Perhaps a more unfortunate takeaway from the API leak is that Google does prioritize larger brands and sites in SERPs. This means if a big brand is ranking high for certain terms, it’ll be pretty difficult to outrank them unless you’re also a big brand. To an extent this is something we already knew in SEO, big brand equals more likely to rank high. The leak confirms that this is definitely the case, but that it isn’t just higher authority or having better content, it’s about them simply being a big, well known entity.
That makes it difficult for smaller or medium sized brands to compete on keywords. So it’s more important now than ever to find your niche and focus on those unique keywords.
Pay attention click based metrics, and where users go after your site
A big takeaway has been that clicks are a major metric for Google’s ranking system. It’s been something they’ve on and off confirmed and denied over the years, but the leak makes it clear that clicks matter a lot. So if you aren’t already it’s time to start paying closer attention to clicks on your store. Which links are people bouncing from quickly, and why? Where are they going instead? Which links have high click volume? Which pages are your customers spending a lot of time on, and which are they leaving quickly? Having a deeper understanding of this information will allow you to finetune those underperforming pages and help them improve their index categorization and ranking.
Consider content format as well as keywords
One thing from the leak that Michael King writes about is the notion that Google can limit how much of a specific format of content can appear in a SERP. For example, they might say only 4 blogs can appear for a given keyword. So then if you go to the SERP for a target keyword and you find quite a lot of one format of content, it may mean that it'll be less likely another piece of content of the same format will rank highly. You may consider a different format of content like a blog, FAQ, video, or landing page to stand out.
—-
It may be some time before we fully understand the implications of the Google API leak. Especially when it comes to specific areas like ecommerce SEO. What we’ve learned so far in some ways has confirmed what we already know, but it has also given valuable insight into what is actually important to Google. Over time more experts will analyze the documents and we may learn even more about ranking systems.