In this chapter, we explore how Google reverse engineers user intent using machine learning as well as neural matching and RankBrain to influence its Search Engine Results Pages.
Now that we’ve looked at how Google crawls and indexes websites, let’s take a closer look at how Google presents the search results to users.
From the moment you start to type your query, Google’s cogs kick into motion to find the information that is most relevant to your query.
On average, your query will have travelled 1,500 miles to return your answer, hitting many of Google’s data centers across the world along the way, at a speed that’s incredibly close to the speed of light.
Google is constantly changing the way it displays the results for a search query, but one thing remains constant: the goal is and always has been to provide users with what they’re looking for quickly.
In fact, hitting the top position in the search results is no longer always the best option. Back in 2014, a number one search ranking would usually yield about 31 percent of clicks in that SERP (search engine results page). This is no longer the case as Google continues to add more elements to the SERPs.
In this section, we’ll explore the various types of search results that Google displays, and later on in the textbook, we’ll explore how Google actually decides which results to display.
Reverse Engineering User Intent With Machine Learning
Google has made a lot of progress when it comes to delivering users with the most relevant answers to their queries quickly, but importantly, it’s the format in which these are presented that offer even more value.
This all depends on the type of search – some results pages are constantly changing whilst others remain consistent. For instance, if you’re searching for the latest score of a football match, Google needs to perform second-by-second updates whereas the search results for a historical figure may remain the same for a long time.
People Also Ask
When performing a search, you’ll likely have seen the People Also Ask boxes in the search results, they look like this:
Clicking on any one of these questions will reveal specific details as well as further related questions which are added to the bottom of the list.
These question-answer boxes are essentially Google’s way of reverse engineering the user intent (purpose) of your query and predicting what else you might be interested in.
This is achieved through the power of machine learning, a branch of computer science where a computer teaches itself how to perform a specific task based on predefined training data.
For example, if you wanted to teach a program to identify images of cars, you would include lots of images of different cars in the training data.
Google leverages relational data (similar to the above example) to gain a better understanding of language and search queries to help provide the most useful search results. Google has a ML tool called Expander, a graph-based system that looks at the relationship between different datasets (search queries).
Below is a screenshot of a presentation from Google I/O 2016 which focuses on Google’s Breakthroughs in Machine Learning.
From the example given in the talk, we can see that there is a strong relationship between the queries “What are the traditions of halloween” and “origins of trick or treating” – this is represented by a solid (rather than a dotted) edge between the two nodes in the graph.
Essentially, Expander will have determined that the two queries are lexically similar in that they both satisfy the user’s intent for learning about halloween traditions.
Where this model becomes truly effective, is with searches that Google has never seen before. In fact, 15% of the queries that Google processes are ones that they’ve never encountered, so here, the machine learning algorithm unravels the search intent of the query that is closest to the “new” query, and presents the user with what it feels is the most relevant results.
Infinite PAA Boxes
As mentioned above, when you click on one of the “People also ask” questions, the question itself expands to take up more screen space. This provides a snippet of the answer to the question, but interestingly, also shows additional questions added to the bottom of the list.
From our “where is lapland” example above, if we click on the question “Is Lapland in the North Pole”, we can see that two new questions appear.
- Does Santa have kids?
- How much does it cost to go to Lapland?
You may be wondering why Google has included the first question as it doesn’t necessarily “fit” with the others. Well, the answer to the original question mentions Santa Claus and Father Christmas a number of times – so Expander will have made a parallel between the two topics and come to the conclusion that there is a strong connection between them.
Clicking on yet another question (in this case, “Does Santa have kids?”), takes us further down the rabbit hole…
This is repeated for each question that is clicked on with seemingly no limit to when it stops. Importantly, each click pushes the traditional organic search results further down the page – which begs the question – should ranking in the top position of the organic rankings always be my priority?
As seen above, the simple answer is no.
Sometimes it’s difficult to choose the right words for a search query, especially when your search refers to a topic that you don’t know much about.
For example, if you were to search for “mars”, Google may return related queries like “mars god of war”, “mars chocolate bar”, “mars planet” etc. These related search terms help users find and explore the information that is related to their original query.
More importantly though, considering that “mars” is an incredibly short query with little to no context given, the related queries allow users to further specify the information they need. In other words, if the user saw the related term “mars god of war” and clicked on it, they are signalling to Google that their interest lies in finding out more about the Roman god as opposed to the chocolate bar or planet.
In most cases, when you enter a search query and hit enter, you’ll see some suggested or related searches at the bottom of the search results page.
Coming from Google themselves, these related searches are “designed to help you navigate to related topics if you didn’t find what you were looking for, or to explore a different dimension of a topic”.
As with PAA (People Also Ask) results, the related searches offer an insight into how Google is reverse engineering the search intent behind a keyword phrase using machine learning techniques.
How Google Generates Related Searches
Google has been granted a patent specifically for this titled “Generating query suggestions using contextual information”.
There are several ways that a search engine may identify potential query suggestions that are related to the user’s original search term.
The first, is to simply look at the search engine’s log files for potential query suggestions – i.e. go through the list of all known search queries and pick out the most relevant ones that are similar to the original query.
An alternative involves looking at the number of times that the terms show up in web pages that are found in the search results, or search result snippets for a particular query.
Let’s look at how the process may actually work as outlined by the patent.
- The search engine receives a query from the searcher and returns the organic results in response to the query.
- At the same time, the engine identifies the most frequently used terms in these pages and chooses the highest-weighted (most important) terms from each of the pages.
- A check is then made to see if this process has already been completed before for this query, and if it is, the engine simply retrieves the collections of terms (referred to as centroid repositories in the patent) and compares them to determine which are still relevant to the query. This is because search results are always changing so the terms that have already been collected may evolve over time.
- Once the most relevant queries have been sorted (from the collections), the search engine examines them for potential related search queries.
- The search engine then adds the selected candidate query suggestions to a set of finalised suggestions to see if they contain any new terms that were not originally included in the original query.
- Finally, the related suggestions are provided to the user in response to the original query.
During step 2 of the process, Google is using machine learning and natural language processing techniques to uncover the semantics behind the original search term so that it can then select the most relevant and useful related searches to the user.
In the context of our example for “where is lapland”, we can see that various word phrases like “flights to” and “holidays” are in bold. This implies that the algorithm identified them as being highly relevant to the original search query because they appeared the most frequently within the search results.
If we look at the top ranking pages for the keyword, we can see why.
Google believes that people who want to know where Lapland is, likely also want to visit it, or find more information about it. This is reflected in the search results which focus on travelling and/or visiting Lapland.
Oh, and, if you’re wondering why “lapland santa” is a related search, take a closer look at the meta description for one of the results.
“A question often asked by many is “Where is Lapland?”. Where in the earth is the home of the Santa Claus? Lapland is situated on the arctic circle, in Finland.”
Of course, this isn’t the only way that Google determines what the related search suggestions should be. More recent papers and patents highlight how query logs of the search engines are used and look at other queries that the user may have searched within the same session, pages that were clicked up during the query session etc.
How Are Related Searches Represented?
We’ve seen how Google may generate the related search queries, but let’s take a look at how these search queries are represented in relation to the user’s original search term.
Google details how this may be achieved in another patent titled “Clustering query refinements by inferred user intent” which was granted in 2017.
Each query is modelled and represented as a graph, where the original query is treated as the root (or head) node of the graph and subqueries are represented as subsequent nodes as seen above.
Considering that each query will have hundreds if not thousands of related search terms, Google needs a way to only choose to display a few of the most important and relevant queries. With that in mind, the patent defines “query refinements” as a particular kind of related query that are “obtained by finding queries that are most likely to follow the original query in a user session”. In other words, these are the search terms that have the same or similar user intent as the original search term. These refinements are grouped into clusters that represent distinct information needs.
From our “mars” example at the top of this section, this essentially means that related queries about the Roman god, the chocolate bar and the planet Mars, would all be organised into their own clusters.
From a webmasters standpoint, although you can’t actively influence what appears on the related searches (it’s all algorithmic), looking at these related searches gives great insights into your audience and their intent for particular queries.
Predictive Search: Autocomplete
When you start typing a search query Google’s autocomplete feature kicks into motion with the aim to help complete your search faster.
From the example above, you can see that typing the letters “new” brings up predictions such as “new york” or “new google account” making it easy for you to finish entering your query without having to type out all of the letters.
Designed to improve the search experience, the autocomplete feature is especially useful for mobile users where typing on a smaller screen can be difficult for some. According to an article published by Google, autocomplete “reduces typing by about 25 percent“.
In fact, they go as far as estimating that autocomplete saves over “200 years of typing time per day“.
How Autocomplete Differs From Related Searches
It’s worth noting that these are different to Google’s related search suggestions which we covered previously in this module. Both of these features are designed to improve the user’s search experience in different ways: autocomplete helps people complete a search they were intending to make whereas the related searches feature helps users to continue their search experience by suggesting new types of searches to explore.
How Google Determines the Predictions
Simply put, Google bases the predictions on real searches that users have already made and then displays the most common and trending queries to the characters that have already been entered by the user. The predictions adapt in response to new characters that are entered into the search box, So for example going from “new” to “newt” would cause the feature to suggest an entirely new set of queries.
Interestingly, Google also looks at your location and previous search history to determine which queries it presents. This means that the predictions that person A sees, will likely be different to what person B sees.
In fact, a patent granted to Google in 2015 titled “Method and system for autocompletion using ranked results” specifically focuses on how Google’s predictions differ from user to user based on several different filters and signals.
Google introduces the concept of a “fingerprint” to determine the predictive search query suggestions for each query that the user is typing.
Each search query may have various different fingerprints which are based on factors like:
- Information from the user’s profile like their location
- The language used for the search query
- Information based on the user’s previous search behaviour i.e. previous search queries
- The type of device that is being used i.e. mobile ,tablet, desktop
- The speed and type of the connection – Google may display fewer suggestions if the quality of the connection is poor.
- And many more!
Some predictive queries will not be displayed by Google for various reasons i.e. they may be offensive or shocking to the user.
Below is a list of the different types of filters that are applied to the predictive queries:
- Privacy Filter – filters out search terms that haven’t been searched by a certain number of unique searchers..
- Infrequently Submitted Query Filter – filters out search terms that are infrequently submitted and probably will not be selected by a user.
- Appropriateness Filter – filters out certain queries based on factors like the inclusion of particular keywords within a query, and/or the content on the search results page for that query.
- Freshness Filter – eliminates any suggestions that may have been submitted earlier than a particular historical point in time. The freshness may be hours, days, weeks, months or years. In other words, if a query was very popular in 2019, but not so much in 2020, then the predictive algorithm may choose to ignore it.
- Anti-Spoofing Filter – created to prevent certain queries i.e. artificially generated queries or URL submissions. For example, the filter may ignore multiple submissions of the same query from the same user.
The anticipated queries shown to the user may be based upon previous queries, from other searchers. This refers to the concept of “dictionaries” that is mentioned in the patent – i.e. past queries used are cached in a dictionary which is then used for easy retrieval (for similar searches).
Depending on the user profile (i.e. the information that Google has about your search history etc), the predictive algorithm would decide which dictionary to use. This is another way that the results from the predictions will be personalised to you. These predictive searches could be made up of either or both commonly submitted searches, and recent searches that could still be cached by the search engine.
Each entry in the dictionary would have certain metrics that help prioritize which queries should be displayed to the user; for example, a popularity value based on how popular the search term is at that particular time so for instance, in March 2020, the snippet “cor” yields lots of suggestions for “coronavirus” as it’s a trending news story, whereas if we performed this same search six months ago, the results would be totally different.
For some of the predicated queries, Google may also cache the actual search results in anticipation of the user selecting one of the predicted queries – this is to help speed up the search process so that the results are displayed almost immediately.
This links to Google’s aim to try and understand the searcher’s intent so as to improve the experience. For instance, with the “cor” example above, Google has prioritised the queries related to “coronavirus” because of the surge in popularity for these search terms.
How RankBrain and Neural Matching Influence the SERPs
As Google says, “useful responses take many forms“.
When you perform a search, Google aims to connect you to the most useful information as quickly as possible. Therefore, each search term warrants a different presentation in the SERPs. Google is able to understand and adapt the search results based on this with the help of its RankBrain algorithm and neural matching.
SERPs. Google is able to understand and adapt the search results based on this with the help of its RankBrain algorithm and neural matching.
Both RankBrain and neural matching have caused some confusion within the SEO community, so here’s a simple but effective breakdown of how they differ from Danny Sullivan via Google’s Search Liaison Twitter account:
- RankBrain helps Google better relate pages to concepts – This means Google can better return relevant pages even if they don’t contain the exact words used in a search, by understanding the page is related to other words and concepts.
- Neural matching helps Google better relate words to searches – For example, neural matching helps Google understand that a search for “why does my TV look strange” is related to the concept of “the soap opera effect.” We can then return pages about the soap opera effect, even if the exact words aren’t used.
Now, let’s take a look at the different types of SERP features that Google may display to the user, with the help of these two concepts.
A common format Google will employ to quickly provide users with the information that they’re looking for, is the featured snippet.
Featured snippets are programmatically generated snippets from web pages that have been specifically identified by Google’s systems to contain the information that the user is looking for.
All Featured snippets include:
- A snippet of information quoted from a third party website
- A link to the page
- The page title
- URL of the page
As you can see from the example above for the keyword “best way to melt chocolate”, Google answers the question with a snippet containing information about the question above all of the other pages that appear in the organic search rankings.
Where do Featured Snippets Come From?
The content is usually pulled straight from the content that’s on the page that is ranking on the first page, but on the odd occasion, it may also contain text from pages ranking lower down in the SERPs.
Google’s automated systems look at the web listings for the query and determine if it would be useful to highlight one of them as a featured snippet listing.
According to a featured snippets study by Ahrefs, 99.58% of featured snippet pages already rank in the top 10 positions of the search results for a particular search query. So if you’re ranking in these positions, your chances of being represented as a featured snippet are pretty high.
If you want to truly dominate the SERPs, by not only ranking in the top organic positions, but also securing a spot in position zero, then Google’s emphasis on creating great content shouldn’t be ignored.
Types of Featured Snippets
There are three main types of featured snippets:
Paragraph – Google provides a short answer to the search query. Some may even be accompanied by an image or multiple images.
List – Here, Google provides the answer to the query in list format.
Table – Unsurprisingly, table snippets are presented by Google, as a table.
However as we’ve seen with the example from the previous section, there are also other forms of featured snippets such as videos.
Featured Snippets Policies
Featured snippets not only have a unique formatting and positioning, but they are also often spoken aloud by the Google Assistant during voice searches. Because of how prominent and valuable they are, Google has published specific standards for what may or may not appear as a featured snippet.
Any snippets that violate their policies of being…
- Sexually explicit
- Lacking expert consensus on public interest topics
…are automatically discredited and not displayed by their systems.
Featured Snippets for Mobile & Voice Search
Mobile search traffic has long since surpassed desktop traffic, and with voice-activated search through digital assistants growing too, the traditional search results no longer work as well as they do on desktop.
This makes featured snippets an ideal format for mobile and voice based search.
That being said, Google still displays the organic search results, as featured snippets are designed to provide a quick answer to the search query.
The Roles of Neural Matching and RankBrain for Featured Snippets
With neural matching, Google’s focus on topics rather than specific keywords is evident through the content that is pulled for some featured snippets, let’s look for an example. In the game of basketball, players must constantly bounce the ball while moving with it. Failing to do this is called “travelling” and is against the rules of the game.
We can see the power of neural matching at play when we look at the results for the keyword “walking with the basketball rule”. Despite the fact that we haven’t explicitly used the term “traveling”, Google is able to understand that the terms “walk with the ball”, “basketball” and “rule” are referring to “travelling”.
The featured snippet shown (albeit a video) is about the rule against “traveling” with the basketball though the term “walking” is never actually mentioned.
What about RankBrain?
RankBrain helps Google identify the search intent (semantics) of a keyword. In other words, if Google sees that two keywords are equivalent (i.e. they have the same intent) it needs to determine the relevant results for these terms.
In the context of our basketball example, neural matching has made the connection between “walking with the basketball” to “travelling”, so RankBrain, infers what type of results should be displayed to the user based on its inferred intent.
For instance, when users search for “traveling”, apart from finding out more information about the rule itself, they may also want to know how the game has changed over time as a result of the rule, or how to officiate the rule.
To summarise, neural matching determines what the concepts of the query are and RankBrain determines what (pages) relate to these concepts.
With the help of webmasters like yourselves, Google is building the largest database of information and knowledge of all time. So without you, Google wouldn’t be where it is today.
Google’s aim is to provide users with accurate and relevant information as quickly as possible, so in order to do this, they launched The Knowledge Graph in 2012.
The Knowledge Graph is a reflection of Google’s Understanding of the facts about just about everything in the world – from places and people to events and scientific concepts etc.
Types of Knowledge Graphs
The Knowledge Graph allows users to search for things, people or places that Google knows about and instantly get information that’s relevant to the query. This includes everything from landmarks, cities and celebrities to sports teams, movies, celestial objects and even works of art.
Let’s take a look at some examples of what Google call Knowledge Cards:
People – Marie Curie
Landmarks – The Taj Mahal
Sports Teams – England Football Team
Movie – Parasite
Celestial Object – Jupiter
From the above, we can see that Google provides accurate and relevant information about each of these real-world entities; for “Parasite”, we’re given the name of the director whereas for “Jupiter” we’re told how far away the planet is from the Sun etc.
Apart from Knowledge Cards, there are also various other types of Knowledge Graphs.
For example, if you search for “famous musicians from Liverpool”, Google notably displays a picture carousel at the top of the page instead of a single Knowledge Card on the right hand side.
Clicking on one of these will direct you to the Knowledge Card of the person.
Likewise, if you search “who is the CEO of Google”, Google displays a knowledge graph directly beneath the search box, telling us that it is “Sundar Pichai”, with some more information about him (in this case, from Wikipedia).
So, how is Google getting this information?
How The Knowledge Graphs Works
The Knowledge Graph automatically maps the attributes and connections between real-world entities from a range of sources. The information is gathered from the web (thanks to webmasters like you), structured databases and licenced data sources such as Freebase, Wikipedia and IMDB (Internet Movie Database).
Taken from an introductory video that Google released at the launch of the Knowledge Graph 2012, Product Management Director Jack Menzel opens with the following question: “Wouldn’t it be amazing if Google understood that the words that you use when you’re doing a Search, well they aren’t just words, they refer to real things in the world?“.
This pretty much explains (in simple terms) how The Knowledge Graph works: it’s about mapping and treating words in search queries to real-world entities and (importantly), the relationships between them.
For example, Google lists Marie Curie as a person in the Knowledge Graph. It also shows us that she had two children, one of whom also won a Nobel Prize. Her husband (Pierre Curie) also won a Nobel Prize. All of these entities are linked together in Google’s graph.
It’s about things (and their relationships), not strings.
Let’s dive deeper.
Google has been granted several patents which enable them to identify, recognise and predict entities from a search query or piece of content.
- Entity identification model training – this patent focuses on recognising entities and predicting when parts of sentences might refer to those entities.Here’s an example to help illustrate this better.If we take the sentence: “In 1890, the President of the United States was Benjamin Harrison”, then the entity text “President of the United States” references the person, “Benjamin Harrison”.”For each complete sentence, the entity identification system emulates typing the sentence and providing portions of the sentence to an entity identification model. The entity identification model determines a predicted entity for each portion of a sentence that it receives as input.“So for instance, one of the inputs may be “In 1890, the P” and the model would then try to determine what the entity is.The ‘P’ could refer to anything from “popular paintings,” “Puerto Rico census,” “printing press,” or “President of the United States.””Using that portion of the sentence as input, the entity identification model determines a predicted entity for that portion of the sentence. The predicted entity may be, for example, the printing press. After comparing the predicted entity, the printing press, to the known entity, Benjamin Harrison, the entity identification system updates the entity identification model. For example, the entity identification model may be updated to decrease the likelihood that “printing press” would be identified as the entity for the input, “In 1890, the P.“This way, the model automatically learns to recognize what the most relevant entity is.
- NLP-based entity recognition and disambiguation – as the name of the paper suggests, this patent details how the system “automatically determines which entities are being referred to by the text using both natural language processing techniques and analysis of information gleaned from contextual data in the surrounding text”.We will go into much more detail about Natural Language Processing in a later module, but essentially, NLP is a method used to improve a computer program’s understanding of language.Let’s take a look at a quick example of where NLP helps.Here’s a sentence: “I want to go to Paris and visit the Eiffel Tower”.We all know that this sentence is referring to Paris, the capital city of France. But, did you know that there are over 20 cities in the U.S.A. called Paris?So, how is a computer program supposed to know which “Paris” is being referred to?NLP looks at the surrounding text, i.e. it would look at the term “Eiffel Tower”, to get a better idea of which Paris we are referring to.The patent goes into much more detail as to how entities are prioritised (i.e. frequency of the entity within the text) so that Paris, France is identified as the entity rather than Paris, U.S.A because of its importance.However, for the basis of this course, all we really need to know from this patent is that Google’s understanding of language greatly increases its ability to identify entities – which in turn, greatly increases its ability to provide users with the information that they want.
The Role of Neural Matching and RankBrain: The Topic Layer
We’ve talked a lot about entities in this section, but it’s important to remember that Google also likes to emphasise the importance of topics and is constantly pushing to improve the user journey and experience.
In fact, the latter is one of its core goals for the next twenty years in Search.
And when it comes to topics, you know that neural matching and RankBrain are at play somewhere.
That’s exactly the case with Knowledge Graphs.
Google acknowledges that “every search journey is different, and especially if you’re not familiar with the topic, it’s not always clear what your next search should be to help you learn more.”
So, instead of forcing you to perform one search at a time, Google can “intelligently show the subtopics that are most relevant to what you’re searching for and make it easy to explore information from the web, all with a single search“.
If you’re wondering why we didn’t show an example like this before, it’s because we thought it would make more sense to include it here.
From the example above, we can see that when you search for the keywords “pugs” and “yorkshire terriers”, Google displays several different tabs like breed characteristics for “pugs” and grooming tips for “yorkshire terrier”. Importantly, we can see that the subtopics are specifically selected for each type of dog – which suggests that somehow, Google has selected the most common and relevant subtopics for each query.
With an increased emphasis on improving the user journey and focusing on topics, Google introduced The Topic Layer to their Knowledge Graphs. The Topic Layer is designed to “understand a topic space and how interests can develop over time as familiarity and expertise grow“.
Therefore, neural matching and RankBrain help Google to identify the relationships between the user’s search query i.e. “pugs” and the subtopics i.e. “grooming tips”, but also, how these subtopics (concepts) relate to each other.
By treating the words from the search queries (and content on the web) as entities and topics, Google is better able to understand what you’re searching for, and in turn, is able to improve the search results by presenting the most relevant information.
Instant Answers: Google’s Answer Box
An extension of Featured Snippets and The Knowledge Graph, is Google’s Instant Answer Box.
As we all now know, Google is constantly trying to interpret and understand the search intent behind the user’s query and in turn, aims to provide the searcher with the answer’s they’re looking for as quickly as possible. The Answer Box – helps to achieve this.
For example, if we search “how tall is the eiffel tower”, Google will tell us the answer immediately at the top of the search results.
In fact, you don’t even have to hit enter in order to get the answer – Google actually displays the answer whilst you’re typing!
Google teams up with businesses who are able to deliver the information and services that searchers want to find and license their content to provide useful responses.
For example, Google lists the showtimes for your local cinema.
You can also find real time weather updates for any location.
This effectively renders the organic search results obsolete as you don’t need to click onto them to find your answer – unless of course, Google’s direct answer still doesn’t answer your query.
As with the information for featured snippets, Google gets the answers to these questions from its index (and its extensive Knowledge Graph). However, a subtle difference here, is that the information is pulled from trustworthy and authoritative sources – after all, the last thing Google wants is to present users with incorrect of invalid answers!
A patent granted to Google (yes, another one) in 2017 titled “Corroborating facts in electronic documents”, describes how Google verifies and corroborates the answers against its knowledge graph and the data that it has indexed across the web so that they can present these answers to users with a higher level of confidence.
Optimising For the Google Answer Box
Considering that Google is relying on authoritative sites to provide these answers, it means webmasters need to work extremely hard to secure a spot in the answer box.
Here are some quick pointers on how to optimise your web pages to increase your chances of appearing in the answer boxes:
- Page Structure – organise the information on your web pages in an efficient and logical manner i.e. include a table of contents, use headings to break up important sections that directly relate to questions around the topic area of your article.
- Answer Directly – Google’s answer boxes are usually short and to the point, therefore, keep your answers to the desired questions as concise as possible .
- Supporting Content – whilst the main answer needs to be concise, it’s important to also provide supporting information on the subject to show Google that you are an authoritative source of information on the subject.
Did You Mean?
Sometimes, when we search for a query, words may be misspelled, or we may even use the wrong words altogether. Instead of asking the user to write the word or term again, Google corrects the word for us using a highly accurate and powerful machine learning algorithm and asks “Did You Mean?”…
Other times, the search engine simply shows you the results for the query with the correct spelling with “Showing results for”…
How Google’s ‘Did You Mean?’ Algorithm Works
Google’s former CTO Douglas Merril explained the basic concept of how the Did You Mean algorithm works in this Search 101 presentation from 2007.
The process goes as follows:
- Searcher misspells a word in their search query
- Searcher doesn’t find what they’re looking for
- Searcher identifies the misspelled word and rewrites the query
- Searcher finds what they were looking for
If we then take into consideration that this pattern will have, and is being repeated millions of times, the statistical machine learning algorithm is able to learn and compile a list of the most common words being spelled incorrectly as well as identify what the most common corrections are too.
This allows Google to (almost) immediately offer spell corrections and present the correct search results for your query.
Originally, spell checks would be made using a dictionary, however this poses several problems as misspelled words may accidentally be found in the dictionary.
Therefore, Google also looks at the context of the word – as explained in this Google Wave Developer Preview from 2009, which is then matched against a language model that Google has built from the web.
This allows Google to automatically correct any incorrect spellings.
By introducing context into the mix, Google is further emphasising its focus on user intent, concepts and topics as without this context, users would be presented with completely incorrect results.