Reading a particular page from a PDF using PyPDF2
For the second example, the PyPDF2 library would be used. Which could be installed by running the following command:
pip install PyPDF2
The same objective could be achieved by using the PyPDF2 library. The library allows processing for pdf files and allows various operations such as reading, writing or creating a pdf file. For the task at hand, the use of the extract text function would be made to obtain the text from the PDF file and display it. The code for this is as follows:
Python3
# importing required modules import PyPDF2 input_file = r "test.pdf" page = 4 # Creating a pdf file object pdfFileObj = open ( 'test.pdf' , 'rb' ) # Creating a pdf reader object pdfReader = PyPDF2.PdfFileReader(pdfFileObj) # Creating a page object pageObj = pdfReader.getPage(page) # Extracting text from page data = pageObj.extractText() # Closing the pdf file object pdfFileObj.close() print (data) |
Output:
He started this Journey with just one thought- every geek should have access to a never ending range of academic resources and with a lot of hardwork and determination, w3wiki was born. Through this platform, he has successfully enriched the minds of students with knowledge which has led to a boost in their careers. But most importantly, w3wiki will always help students stay in touch with their Geeky side! EXPERT ADVICE CEO and Founder of w3wiki I understand that many students who come to us are either fans of the sciences or have been pushed into this field by their parents. And I just want you to know that no matter where life takes you, we at w3wiki hope to have made this journey easier for you.Mr. Sandeep Jain 3
Explanation:
Firstly the path to the input pdf and the page number are defined in separate variables. Then the pdf file is opened, and its file object is stored in a variable. Then this variable is passed as an argument to the PdfFileReader function, which creates a pdf reader object out of a file object. Then the data stored within the page number defined in the page variable is obtained and stored in a variable. Then the text is extracted from that PDF page, and the file object is closed. In the end, the extracted text data is displayed.
Read a Particular Page from a PDF File in Python
Document processing is one of the most common use cases for the Python programming language. This allows the language to process many files, such as database files, multimedia files and encrypted files, to name a few. This article will teach you how to read a particular page from a PDF (Portable Document Format) file in Python.