Generating Bigrams using NLTK

Generating bigrams using the Natural Language Toolkit (NLTK) in Python is a straightforward process. The steps to generated bigrams from text data using NLTK are discussed below:

Import NLTK and Download Tokenizer: The code first imports the nltk library and downloads the punkt tokenizer, which is part of NLTK’s data used for tokenization.
Tokenization: The word_tokenize() function from nltk.tokenize is used to tokenize the input text into a list of words (tokens). Tokenization is the process of splitting a text into individual words or tokens.
Generating Bigrams: The bigrams function from nltk.util is then used to generate a list of bigrams from the tokenized words. Each bigram is a tuple containing two consecutive words from the text.
Printing Bigrams: Finally, the code iterates over the list of bigrams (bigram_list) and prints each bigram.

Python3

import nltk
nltk.download('punkt') # Download the 'punkt' tokenizer
from nltk.tokenize import word_tokenize
from nltk.util import bigrams

# Sample text
text = "You are learning from Geeks for Geeks"

# Tokenize the text
tokens = word_tokenize(text)

# Generate bigrams
bigram_list = list(bigrams(tokens))

# Print the bigrams
for bigram in bigram_list:
    print(bigram)

Output:

('You', 'are')
('are', 'learning')
('learning', 'from')
('from', 'Geeks')
('Geeks', 'for')
('for', 'Geeks')

Generate bigrams with NLTK

Bigrams, or pairs of consecutive words, are an essential concept in natural language processing (NLP) and computational linguistics. Their utility spans various applications, from enhancing machine learning models to improving language understanding in AI systems. In this article, we are going to learn how bigrams are generated using NLTK library.

Table of Content

What are Bigrams?
How Bigrams are generated?
Generating Bigrams using NLTK
Applications of Bigrams
FAQs on Bigrams in NLP

Generating Bigrams using NLTK

Generate bigrams with NLTK

Categories

Contact US

Generating Bigrams using NLTK

Generate bigrams with NLTK

Similar Reads

Categories

Contact US