Google’s Did you mean Algorithm

Now that we have understood what is Google’s Did you mean feature, it’s time to discover the algorithm behind it. There are substantially 5 steps involved in the successful working of the Did you mean algorithm, as shown in the image below:

Google’s Did you mean Algorithm

Let us look into these steps one by one in detail.

How does the Google’s “Did you mean” Algorithm work?

When you wish to say something but miss a few characters/words, the first response that anyone gives you is “Did you mean this?“. The same is the case with Google’s search algorithms. As soon as you search for a misspelled word or term on Google, it shows you the famous “Did you mean” alternative. We are sure you have come across a such scenario at least once. But have you ever wondered how Google figures it out? How does this Google’s Did you mean algorithm work? 

To answer all your such queries, in this blog, we have covered a detailed analysis of how Google’s Did you mean algorithm works, what goes behind each step of the autocorrect feature of Google’s Did you mean algorithm and more.

How does the Google “Did you mean?” Algorithm work?

Similar Reads

Google’s Did you mean Algorithm:

Now that we have understood what is Google’s Did you mean feature, it’s time to discover the algorithm behind it. There are substantially 5 steps involved in the successful working of the Did you mean algorithm, as shown in the image below:...

1. Finding possible substitute list using K-Gram and Jaccard Distance:

A k-gram index maps a k-gram to a posting list of all possible vocabulary terms that contain it. In this step, the possible list of alternatives/substitutes is found using the Jaccard Distance....

2. Filter unwanted substitutes using Levenshtein Distance:

The Levenshtein distance is also referred to as the Edit Distance. Levenshtein distance calculates the distance between two words and returns a number representing how similar they are. The lower the distance (i.e. the smaller the number returned), the more similar they are....

3. Filtering substitutes based on cost using Damerau–Levenshtein distance:

Damerau Levenshtein distance is a variant of Levenshtein distance which is a type of Edit distance. Edit distance is a large class of distance metric for measuring the dissimilarity between two strings by computing a minimum number of operations (from a set of operations) used to convert one string to another string. It can be seen as a way of pairwise string alignment....

4. Filtering non-related substitutes using Hidden Markov model:

A Hidden Markov Model can describe a situation where events occur but cannot be accurately observed. If therelationship between the observed info and the hidden info is known, the hidden info can be deduced with some likelihood of success. This can be done to estimate hidden values or predict future hidden values. In this case, the “hidden” info is what the user meant to type, and the “observed” info is what they did type....

5. Finding best-matched substitutes based on Probability and Statistics:

Statistics and Probability are two powerful tools. What if we can say that the likelihood of something occurring is greater than the other? Suppose we had some statistics showing what I usually eat on Sunny and rainy days. If it is a sunny day, I would eat ice cream and if it is a rainy day, I would eat pudding. Based on this information, I have a higher likelihood of eating ice cream on a sunny day....