What is Zipf’s Law?
Zipf’s law is also known as the principle of least effort. In natural language texts, it has been observed that:
- The second most used word appears half as often as the most used word.
- The third most used word appears one-third the number of times the most used word appears, and so on.
Zipf proposed that such a distribution was observed because we tend to frequently use words that we are more comfortable with. We try to communicate as efficiently as possible by putting in the least amount of effort.
F Auerbach, a German physicist observed the phenomenon concerning population in cities. The second most populous city had half the population of the most populous city. In 1932, Zipf observed a similar distribution of word frequencies in natural language text (English). He proposed a law based on his findings and it began to be known as Zipf’s law. The same kind of relationship was observed in corporation sizes, income of people etc.
Zipf’s Law
Zipf’s law is an empirical formula discovered by George Zipf in 1930s. Zip’s law describes the relationship between the frequency of words in language corpus and their rank in a frequency sorted list. In this article, we will be diving into the concept of Zipf’s law and its application in natural language processing.
Table of Content
- What is Zipf’s Law?
- Mathematical Formulation
- Example of Zipf’s Law
- Python Implementation of Zipf’s Law
- Applications
- Deviation from Zipf’s Law