How to Read PDF File in Python

PDF(Portable Document Format) is one of the most popular and widely used digital media. It is used to display and exchange documents assuredly, independent of software, hardware, or operating system.

In this article, we will see how to read pdf file in Python. For that, we are using a third-party Python module PyPDF2. This module is capable of extracting document information, splitting documents page by page, merging documents, cropping pages, merging multiple pages into a single page, encrypting and decrypting PDF files etc.

To install PyPDF2,

pip install PyPDF2

import PyPDF2

pdf_FileOb = open('test.pdf', 'rb')

pdf_Reader = PyPDF2.PdfFileReader(pdf_FileOb)

print("The number of pages: ", pdf_Reader.numPages)

page_Ob = pdf_Reader.getPage(0)

print(page_Ob.extractText())

pdf_FileOb.close()

Output:

The number of pages: 1

Take
Risks In Your Life

If
You Win, You Can Lead !



-
Swami Vivekananda

Now lets see what all this code means.

The first step is to import PyPDF2 module. After that, we are opening our PDF file using in open() function in the binary mode. The next step is to create an object of the opened file using the PdfFileReader class of the PyPDF2 module. We get a pdf reader object from this. The numpages property gives the number of pages in the pdf file. The getpage() function takes the page number as an argument and returns the page object. The function extractText() extract text from the selected pdf page. And finally, after doing all the operations on the PDF file, we have to close the file object. This can be done using close().

You may find some similarities between the PyPDF2 operations and built-in file operations. Keep in mind that this module is not completely perfect. It may be unable to work with some particular PDF files

Python

How to Read PDF File in Python

How to set python path in Windows 10?

How to connect Oracle database in Python

Contact

Company

Useful Links

Support

Python

How to set python path in Windows 10?

How to connect Oracle database in Python

You may also like

15 Powerful Step for Mastering JSON Parsing in Python: Boosting Data Manipulation and Validation

Introduction to Transfer Learning with Python: A Practical Guide

How to Check Type in Python

Contact

Company

Useful Links

Support

Login with your site account

Register a new account