How queries are processed in search engine?
Whenever we search anything on the search engine, it only takes a second or two for the output generation. However, a lot goes on in the backend. Indexing and Querying are two essential components behind the processing of a search engine. They are like the building blocks of search engines. Let’s take a look at these processes –
Indexing
• The indexing process begins with web crawling where the so-called spiders crawl across the world wide web and collect data.
• The data collected is stored in the form of a database for the process of indexing. This is also termed as text acquisition.
• Then the collected data is broken down into tokens or keywords. These tokens are used by the search engine in creating indexes. Each keyword is associated to a particular document and through indexing the data becomes organized and it helps the search engine to quickly retrieve a particular information.
Querying
• When a user searches something on the search engine a query input is generated.
• Then the search engine parses the generated query and searches at the indexes for the matching documents.
• Using a ranking algorithm, the search engine ranks the documents based on their relevance. Finally, the generated list is presented to the user with most relevant results on the top.
Search Engine
Imagine you are in a library and are looking for a particular book. Now if you have to go through every book in each category, it will be a tedious and difficult task. Moreover, if the library has more than a million books then this task seems next to impossible. You are definitely going to need a librarian who can bring the relevant books for you without any delay. Well, that’s where a search engine comes in.
Search engine spamming refers to the practice of creating Web pages, or sets of Web pages, designed to get a high relevance rank for some queries, even though the sites are not popular sites. Popularity ranking schemes such as PageRank make the job of search engine spamming more difficult, since just repeating words to get a high TF– IDF score was no longer sufficient. However, even these techniques can be spammed, by creating a collection of Web pages that point to each other, increasing their popularity rank. Techniques such as using sites instead of pages as the unit of ranking (with appropriately normalized jump probabilities) have been proposed to avoid some spamming techniques, but are not fully effective against other spamming techniques. The war between search engine spammers and search engines continues even today.
The hubs and authorities approach of the HITS algorithm is more susceptible to spamming. A spammer can create a Web page containing links to good authorities on a topic, and gains a high hub score as a result. In addition, the spammer’s Web page includes links to pages that they wish to popularize, which may not have any relevance to the topic. Because these linked pages are pointed to by a page with high hub score, they get a high but undeserved authority score.
Table of Content
- What is a Search Engine?
- History of search engines
- Working of a search engine
- Architecture Of Search Engine
- How queries are processed in search engine?
- Search Engine Advantages:
- Examples Of Popularly Used Search Engines