Visualization of Word Embeddings using t-SNE

Visualizing word embeddings can provide insights into how words are positioned relative to each other in a high-dimensional space. In this code, we demonstrate how to visualize word embeddings using t-SNE (t-distributed Stochastic Neighbor Embedding), a technique for dimensionality reduction, after training a Word2Vec model on the ‘text8’ corpus.

Code Steps:

  1. Import necessary libraries.
  2. Load the ‘text8’ corpus.
  3. Train a Word2Vec model on the corpus.
  4. Define sample words for visualization.
  5. Filter words existing in the model’s vocabulary.
  6. Retrieve word embeddings for sample words.
  7. Convert embeddings to a numpy array.
  8. Print original embedding vector shape.
  9. Use t-SNE to reduce embeddings to 2D.
  10. Print the shape of reduced embeddings.
  11. Plot word embeddings using Matplotlib.
  12. Set plot attributes.
  13. Save the plot as an image file.
  14. Display the plot.
Python3
import numpy as np
import matplotlib.pyplot as plt
from sklearn.manifold import TSNE
import gensim.downloader as api
from gensim.models import Word2Vec

# Load the text8 corpus from gensim
corpus = api.load('text8')

# Train a Word2Vec model on the text8 corpus
model = Word2Vec(corpus)

# Sample words for visualization
words = ['cat', 'dog', 'elephant', 'lion', 'bird', 'rat', 'wolf', 'cow',
         'goat', 'snake', 'rabbit', 'human', 'parrot', 'fox', 'peacock',
         'lotus', 'roses', 'marigold', 'jasmine', 'computer', 'robot',
         'software', 'vocabulary', 'machine', 'eye', 'vision',
         'grammar', 'words', 'sentences', 'language', 'verbs', 'noun',
         'transformer', 'embedding', 'neural', 'network', 'optimization']

# Filter words that exist in the model's vocabulary
words = [word for word in words if word in model.wv.key_to_index]

# Get word embeddings for sample words from the pre-trained model
word_embeddings = [model.wv[word] for word in words]

# Convert word embeddings to a numpy array
embeddings = np.array(word_embeddings)

# Print original embedding vector shape
print('Original embedding vector shape', embeddings.shape)

# Use t-SNE to reduce dimensionality to 2D with reduced perplexity
tsne = TSNE(n_components=2, perplexity=2)  # Reduced perplexity value
embeddings_2d = tsne.fit_transform(embeddings)

# Print the shape of the embeddings after applying t-SNE
print('After applying t-SNE, embedding vector shape', embeddings_2d.shape)

# Plot the word embedding graph
# Set figure size and DPI for high-resolution output
plt.figure(figsize=(10, 7), dpi=1000)
plt.scatter(embeddings_2d[:, 0], embeddings_2d[:, 1], marker='o')

# Add labels to data points
for i, word in enumerate(words):
    plt.text(embeddings_2d[i, 0], embeddings_2d[i, 1], word,
             fontsize=10, ha='left', va='bottom')  # Adjust text placement for better readability

plt.xlabel('t-SNE Dimension 1')
plt.ylabel('t-SNE Dimension 2')
plt.title('Word Embedding Graph (t-SNE with Word2Vec)')
plt.grid(True)
plt.savefig('embedding.png')  # Save the plot as an image file
plt.show()

Output:

Original embedding vector shape (37, 100)
After applying tsne embedding vector shape (37, 2)


Word Embedding


What are Embedding in Machine Learning?

In recent years, embeddings have emerged as a core idea in machine learning, revolutionizing the way we represent and understand data. In this article, we delve into the world of embeddings, exploring their importance, applications, and the underlying techniques used to generate them.

Table of Content

  • What are Embedding?
  • Key terms used for Embedding
  • Why Embedding is so important?
  • What Object can be embedded?
  • How do embeddings work?
  • Visualization of Word Embeddings using t-SNE
  • Frequently Asked Questions on Embedding

Similar Reads

What are Embedding?

Embedding can be defined as the mathematical representation of discrete objects or values as dense vectors within a continuous vector space....

Key terms used for Embedding

Now, let’s understand the key terms one by one, which we have frequently used in above definintions of embedding....

Why Embedding is so important?

Embeddings are used across various domains and tasks for several reasons:...

What Object can be embedded?

From textual data to images and beyond, embeddings offer a versatile approach to encoding information into dense vector representations. Some of the major types of objects or values that can be embedded include:...

How do embeddings work?

Embeddings work by transforming high-dimensional and sparse data into dense, low-dimensional representations in a continuous vector space. These representations capture meaningful relationships and patterns in the data, making it easier for machine learning algorithms to process, analyze, and learn from the data effectively....

Visualization of Word Embeddings using t-SNE

Visualizing word embeddings can provide insights into how words are positioned relative to each other in a high-dimensional space. In this code, we demonstrate how to visualize word embeddings using t-SNE (t-distributed Stochastic Neighbor Embedding), a technique for dimensionality reduction, after training a Word2Vec model on the ‘text8’ corpus....

Conclusion

Embeddings have revolutionized machine learning by providing compact, dense representations of data across various domains. From capturing semantic relationships in natural language to extracting features from images and audio, embeddings play a crucial role across diverse domains. Their ability to generalize, transfer knowledge, and facilitate efficient computations makes them indispensable in modern AI applications. With embeddings, we unlock new possibilities for understanding and processing data, driving innovation and advancement in AI technology....

Frequently Asked Questions on Embedding

Q. What is Continuous Vector space in embedding?...