How to extract text from PDF in Python

PDF(Portable Document Format) is one of the most popular and widely used digital media.

In this article, we will see how to extract text from a pdf file in Python. For that, we are using a third-party Python module PyPDF2.

To install PyPDF2,

pip install PyPDF2

import PyPDF2

#creating a pdf file object
pdfFileOb = open('test.pdf', 'rb')

#creating a pdf reader object
pdfReader = PyPDF2.PdfFileReader(pdfFileOb)

#printing number of pages in the pdf file
print(pdfReader.numPages)

#creating a page object
pageOb = pdfReader.getPage(0)

#extracting text from page
print(pageOb.extractText())

#closing the pdf file object
pdfFileOb.close()

Output:

1

Take
Risks In Your Life

If
You Win, You Can Lead !



-
Swami Vivekananda

Now lets see what all this code means.

pdfFileOb = open('test.pdf', 'rb') -We opened the test.pdf in binary mode and saved the file object as pdfFileObj.
pdfReader = PyPDF2.PdfFileReader(pdfFileOb) -Here, we create an object of PdfFileReader class of PyPDF2 module and pass the pdf file object & get a pdf reader object.
print(pdfReader.numPages) –numPages property gives the number of pages in the pdf file. For example, in our case, it is 1(see first line of output).
pageOb = pdfReader.getPage(0) -Now, we create an object of PageObject class of PyPDF2 module. pdf reader object has function getPage() which takes page number (starting from index 0) as argument and returns the page object.
print(pageOb.extractText()) -Page object has function extractText() to extract text from the pdf page.
pdfFileOb.close() -we close the pdf file object.

Note that PyPDF2 might make mistakes when extracting text from a PDF and may even be unable to open some PDFs at all. PyPDF2 may be unable to work with some of your particular PDF files.

Python

How to extract text from PDF in Python

How to extract text from image in python?

How to find Average in Python

Contact

Company

Useful Links

Support

Python

How to extract text from image in python?

How to find Average in Python

You may also like

15 Powerful Step for Mastering JSON Parsing in Python: Boosting Data Manipulation and Validation

Introduction to Transfer Learning with Python: A Practical Guide

How to Check Type in Python

Contact

Company

Useful Links

Support

Login with your site account

Register a new account