Stemming and Lemmatization

When working with Natural Language, we are not much interested in the form of words – rather, we are concerned with the meaning that the words intend to convey. Thus, we try to map every word of the language to its root/base form. This process is called canonicalization.

E.g. The words ‘play’, ‘plays’, ‘played’, and ‘playing’ convey the same action – hence, we can map them all to their base form i.e. ‘play’.

Now, there are two widely used canonicalization techniques: Stemming and Lemmatization.

Introduction to NLTK: Tokenization, Stemming, Lemmatization, POS Tagging

Natural Language Toolkit (NLTK) is one of the largest Python libraries for performing various Natural Language Processing tasks. From rudimentary tasks such as text pre-processing to tasks like vectorized representation of text – NLTK’s API has covered everything. In this article, we will accustom ourselves to the basics of NLTK and perform some crucial NLP tasks: Tokenization, Stemming, Lemmatization, and POS Tagging.

Table of Content

What is the Natural Language Toolkit (NLTK)?
Tokenization
Stemming and Lemmatization
Stemming
Lemmatization
Part of Speech Tagging

Stemming and Lemmatization

Introduction to NLTK: Tokenization, Stemming, Lemmatization, POS Tagging

Categories

Contact US

Stemming and Lemmatization

Introduction to NLTK: Tokenization, Stemming, Lemmatization, POS Tagging

Similar Reads

Categories

Contact US