Reading a particular page from a PDF using PyPDF2

For the second example, the PyPDF2 library would be used. Which could be installed by running the following command:

pip install PyPDF2

The same objective could be achieved by using the PyPDF2 library. The library allows processing for pdf files and allows various operations such as reading, writing or creating a pdf file. For the task at hand, the use of the extract text function would be made to obtain the text from the PDF file and display it. The code for this is as follows:

Python3

# importing required modules
import PyPDF2
   
input_file = r"test.pdf"
 
page = 4
 
# Creating a pdf file object
pdfFileObj = open('test.pdf', 'rb')
   
# Creating a pdf reader object
pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
   
# Creating a page object
pageObj = pdfReader.getPage(page)
   
# Extracting text from page
data = pageObj.extractText()
 
# Closing the pdf file object
pdfFileObj.close()
 
print(data)

Output:

He started this Journey with just one 
thought- every geek should have 
access to a never ending range of 
academic resources and with a lot 
of hardwork and determination, 
w3wiki was born.
Through this platform, he has        
successfully enriched the minds of 
students with knowledge which has 
led to a boost in their careers. But 
most importantly, w3wiki 
will always help students stay in 
touch with their Geeky side!
EXPERT ADVICE
CEO and Founder of 
w3wiki
                  I understand that many 
students who come to us are 
either fans of the sciences or 
have been pushed into this 
field by their parents.
And I just want you to 
know that no matter 
where life takes you, we 
at w3wiki hope 
to have made this 
journey easier for  
you.Mr. Sandeep Jain
3

Explanation:

Firstly the path to the input pdf and the page number are defined in separate variables. Then the pdf file is opened, and its file object is stored in a variable. Then this variable is passed as an argument to the PdfFileReader function, which creates a pdf reader object out of a file object. Then the data stored within the page number defined in the page variable is obtained and stored in a variable. Then the text is extracted from that PDF page, and the file object is closed. In the end, the extracted text data is displayed.

Read a Particular Page from a PDF File in Python

Document processing is one of the most common use cases for the Python programming language. This allows the language to process many files, such as database files, multimedia files and encrypted files, to name a few. This article will teach you how to read a particular page from a PDF (Portable Document Format) file in Python.

Reading a particular page from a PDF using PyPDF2

Python3

Explanation:

Read a Particular Page from a PDF File in Python

Categories

Contact US

Reading a particular page from a PDF using PyPDF2

Python3

Explanation:

Read a Particular Page from a PDF File in Python

Similar Reads

Categories

Contact US