Extractive Summarization
Extractive summarization algorithms are employed to generate a summary by selecting and combining key passages from the source material. Unlike humans, these models emphasize creating the most essential sentences from the original text rather than generating new ones.
Extractive summarization utilizes the Text Rank algorithm, which is highly suitable for text summarization tasks. Let’s explore how it functions by considering a sample text summarization scenario.
Utilizing the TextRank Algorithm for Extractive Text Summarization:
The implementation of TextRank offers a spaCy pipeline as an additional feature. SpaCy is an excellent Python library for addressing challenges in natural language processing. Additionally, you need pytextrank, a spaCy extension that effectively implements the TextRank algorithm. It is evident that the TextRank algorithm can produce reasonably satisfactory results. Nevertheless, extractive summarization techniques merely provide a modified version of the original text, retaining certain phrases that were not eliminated, instead of generating new text (new data) to summarize the information contained in the original text.
Prerequisite
Spacy
To Install the Spacy and Dowload the English Language Dependency run the below code in terminal
!pip install spacy
To install the english laguage dependency
!python3 -m spacy download en_core_web_lg
TextRank
To Install the TextRank
!pip install pytextrank
Text Summarizations
This code uses spaCy and PyTextRank to automatically summarize a given text. It first installs the required packages, downloads a spaCy language model, and loads the model with the TextRank summarization pipeline. It then processes a lengthy text and generates a summary of the text’s key phrases and sentences. The summary is limited to 2 phrases and 2 sentences.
Python3
import spacy import pytextrank nlp = spacy.load( "en_core_web_lg" ) nlp.add_pipe( "textrank" ) example_text = """Deep learning (also known as deep structured learning) is part of a broader family of machine learning methods based on artificial neural networks with representation learning. Learning can be supervised, semi-supervised or unsupervised. Deep-learning architectures such as deep neural networks, deep belief networks, deep reinforcement learning, recurrent neural networks and convolutional neural networks have been applied to fields including computer vision, speech recognition, natural language processing, machine translation, bioinformatics, drug design, medical image analysis, material inspection and board game programs, where they have produced results comparable to and in some cases surpassing human expert performance. Artificial neural networks (ANNs) were inspired by information processing and distributed communication nodes in biological systems. ANNs have various differences from biological brains. Specifically, neural networks tend to be static and symbolic, while the biological brain of most living organisms is dynamic (plastic) and analogue. The adjective "deep" in deep learning refers to the use of multiple layers in the network. Early work showed that a linear perceptron cannot be a universal classifier, but that a network with a nonpolynomial activation function with one hidden layer of unbounded width can. Deep learning is a modern variation which is concerned with an unbounded number of layers of bounded size, which permits practical application and optimized implementation, while retaining theoretical universality under mild conditions. In deep learning the layers are also permitted to be heterogeneous and to deviate widely from biologically informed connectionist models, for the sake of efficiency, trainability and understandability, whence the structured part.""" print ( 'Original Document Size:' , len (example_text)) doc = nlp(example_text) for sent in doc._.textrank.summary(limit_phrases = 2 , limit_sentences = 2 ): print (sent) print ( 'Summary Length:' , len (sent)) |
Output:
Original Document Size: 1808
Deep-learning architectures such as deep neural networks, deep belief networks, deep reinforcement learning, recurrent neural networks and convolutional neural networks have been applied to fields including computer vision, speech recognition, natural language processing, machine translation, bioinformatics, drug design, medical image analysis, material inspection and board game programs, where they have produced results comparable to and in some cases surpassing human expert performance.
Summary Length: 76
Specifically, neural networks tend to be static and symbolic, while the biological brain of most living organisms is dynamic (plastic) and analogue.
Summary Length: 27
Text Summarization in NLP
Automatic text summarization refers to a group of methods that employ algorithms to compress a certain amount of text while preserving the text’s key points. Although it may not receive as much attention as other machine learning successes, this field of computer automation has witnessed consistent advancement and improvement. Therefore, systems capable of extracting the key concepts from the text while maintaining the overall meaning have the potential to revolutionize a variety of industries, including banking, law, and even healthcare.
Types of Text Summarization
There are typically two basic methods for automatic text summarization:
- Extractive summarization
- Abstractive summarization