Extractive Summarization

Extractive summarization algorithms are employed to generate a summary by selecting and combining key passages from the source material. Unlike humans, these models emphasize creating the most essential sentences from the original text rather than generating new ones.

Extractive summarization utilizes the Text Rank algorithm, which is highly suitable for text summarization tasks. Let’s explore how it functions by considering a sample text summarization scenario.

Utilizing the TextRank Algorithm for Extractive Text Summarization:

The implementation of TextRank offers a spaCy pipeline as an additional feature. SpaCy is an excellent Python library for addressing challenges in natural language processing. Additionally, you need pytextrank, a spaCy extension that effectively implements the TextRank algorithm. It is evident that the TextRank algorithm can produce reasonably satisfactory results. Nevertheless, extractive summarization techniques merely provide a modified version of the original text, retaining certain phrases that were not eliminated, instead of generating new text (new data) to summarize the information contained in the original text.

Prerequisite

Spacy

To Install the Spacy and Dowload the English Language Dependency run the below code in terminal

!pip install spacy

To install the english laguage dependency

!python3 -m spacy download en_core_web_lg

TextRank

To Install the TextRank

!pip install pytextrank

Text Summarizations

This code uses spaCy and PyTextRank to automatically summarize a given text. It first installs the required packages, downloads a spaCy language model, and loads the model with the TextRank summarization pipeline. It then processes a lengthy text and generates a summary of the text’s key phrases and sentences. The summary is limited to 2 phrases and 2 sentences.

Python3




import spacy
import pytextrank
 
nlp = spacy.load("en_core_web_lg")
nlp.add_pipe("textrank")
 
example_text = """Deep learning (also known as deep structured learning) is part of a
broader family of machine learning methods based on artificial neural networks with
representation learning. Learning can be supervised, semi-supervised or unsupervised.
Deep-learning architectures such as deep neural networks, deep belief networks, deep reinforcement learning,
recurrent neural networks and convolutional neural networks have been applied to
fields including computer vision, speech recognition, natural language processing,
machine translation, bioinformatics, drug design, medical image analysis, material
inspection and board game programs, where they have produced results comparable to
and in some cases surpassing human expert performance. Artificial neural networks
(ANNs) were inspired by information processing and distributed communication nodes
in biological systems. ANNs have various differences from biological brains. Specifically,
neural networks tend to be static and symbolic, while the biological brain of most living organisms
is dynamic (plastic) and analogue. The adjective "deep" in deep learning refers to the use of multiple
layers in the network. Early work showed that a linear perceptron cannot be a universal classifier,
but that a network with a nonpolynomial activation function with one hidden layer of unbounded width can.
Deep learning is a modern variation which is concerned with an unbounded number of layers of bounded size,
which permits practical application and optimized implementation, while retaining theoretical universality
under mild conditions. In deep learning the layers are also permitted to be heterogeneous and to deviate widely
from biologically informed connectionist models, for the sake of efficiency, trainability and understandability,
whence the structured part."""
print('Original Document Size:',len(example_text))
doc = nlp(example_text)
 
for sent in doc._.textrank.summary(limit_phrases=2, limit_sentences=2):
    print(sent)
    print('Summary Length:',len(sent))


Output:

Original Document Size: 1808
Deep-learning architectures such as deep neural networks, deep belief networks, deep reinforcement learning, recurrent neural networks and convolutional neural networks have been applied to fields including computer vision, speech recognition, natural language processing, machine translation, bioinformatics, drug design, medical image analysis, material inspection and board game programs, where they have produced results comparable to and in some cases surpassing human expert performance.
Summary Length: 76
Specifically, neural networks tend to be static and symbolic, while the biological brain of most living organisms is dynamic (plastic) and analogue.
Summary Length: 27

Text Summarization in NLP

Automatic text summarization refers to a group of methods that employ algorithms to compress a certain amount of text while preserving the text’s key points. Although it may not receive as much attention as other machine learning successes, this field of computer automation has witnessed consistent advancement and improvement. Therefore, systems capable of extracting the key concepts from the text while maintaining the overall meaning have the potential to revolutionize a variety of industries, including banking, law, and even healthcare.

Types of Text Summarization

There are typically two basic methods for automatic text summarization:

  1. Extractive summarization
  2. Abstractive summarization

Similar Reads

Extractive Summarization

Extractive summarization algorithms are employed to generate a summary by selecting and combining key passages from the source material. Unlike humans, these models emphasize creating the most essential sentences from the original text rather than generating new ones....

Abstractive Summarization:

...

Conclusion

Abstractive summarization techniques emulate human writing by generating entirely new sentences to convey key concepts from the source text, rather than merely rephrasing portions of it. These fresh sentences distill the vital information while eliminating irrelevant details, often incorporating novel vocabulary absent in the original text. The term “Transformers” has recently dominated the natural language processing field, although these models initially relied on designs based on recurrent neural networks (RNNs)....