• Offers
    • RegisterLogin
      • Learn More
    PythonPoint.netPythonPoint.net
    • Offers
    • RegisterLogin
      • Learn More

      Python

      SKILL IS IMPORTANT THAN DEGREE Be skill full.
      • Home
      • Blog
      • Python
      • How to extract text from PDF in Python

      How to extract text from PDF in Python

      • Posted by Python Point Team
      • Categories Python
      • Date December 31, 2022
      • Comments 0 comment
      how to extract text from pdf in python

      PDF(Portable Document Format) is one of the most popular and widely used digital media.

      In this article, we will see how to extract text from a pdf file in Python. For that, we are using a third-party Python module PyPDF2.

      To install PyPDF2,

      pip install PyPDF2

      import PyPDF2
      
      #creating a pdf file object
      pdfFileOb = open('test.pdf', 'rb')
      
      #creating a pdf reader object
      pdfReader = PyPDF2.PdfFileReader(pdfFileOb)
      
      #printing number of pages in the pdf file
      print(pdfReader.numPages)
      
      #creating a page object
      pageOb = pdfReader.getPage(0)
      
      #extracting text from page
      print(pageOb.extractText())
      
      #closing the pdf file object
      pdfFileOb.close()

      Output:

      1
      
      Take
      Risks In Your Life
      
      If
      You Win, You Can Lead !
      
      
      
      -
      Swami Vivekananda

      Now lets see what all this code means.

      • pdfFileOb = open('test.pdf', 'rb') -We opened the test.pdf in binary mode and saved the file object as pdfFileObj.
      • pdfReader = PyPDF2.PdfFileReader(pdfFileOb) -Here, we create an object of PdfFileReader class of PyPDF2 module and  pass the pdf file object & get a pdf reader object.
      • print(pdfReader.numPages) –numPages property gives the number of pages in the pdf file. For example, in our case, it is 1(see first line of output).
      • pageOb = pdfReader.getPage(0) -Now, we create an object of PageObject class of PyPDF2 module. pdf reader object has function getPage() which takes page number (starting from index 0) as argument and returns the page object.
      • print(pageOb.extractText()) -Page object has function extractText() to extract text from the pdf page.
      • pdfFileOb.close() -we close the pdf file object.

      Note that PyPDF2 might make mistakes when extracting text from a PDF and may even be unable to open some PDFs at all. PyPDF2 may be unable to work with some of your particular PDF files.

      • Share:
      author avatar
      Python Point Team

      Previous post

      How to extract text from image in python?
      December 31, 2022

      Next post

      How to find Average in Python
      December 31, 2022

      You may also like

      15 Powerful Step for Mastering JSON Parsing in Python: Boosting Data Manipulation and Validation
      21 June, 2023

      Introduction In the world of programming, data plays a crucial role, and managing it efficiently is of utmost importance. JSON (JavaScript Object Notation) has emerged as a popular data interchange format due to its simplicity and flexibility. In this article, …

      Introduction to Transfer Learning with Python: A Practical Guide
      31 December, 2022

      Introduction: Definition of transfer learning Overview of how transfer learning works in the context of machine learning Why transfer learning is useful and important Section 1: Transfer learning in Python with Keras In this section, we will explore how to …

      How to Check Type in Python
      31 December, 2022

      In this article, we will learn to check type in Python. The built-in function type() can be used to check the type of data in Python.

      Subscribe
      Login
      Notify of
      Please login to comment
      0 Discussion
      Inline Feedbacks
      View all comments

      Latest Courses

      (Hindi) Ways to earn minimum 1 Lakh Per month as Programmer

      (Hindi) Ways to earn minimum 1 Lakh Per month as Programmer

      ₹10,000
      (HINDI) Full Stack Web Development In Python 3.8 And Django 3.1

      (HINDI) Full Stack Web Development In Python 3.8 And Django 3.1

      ₹25,000 ₹2,500

      Latest Posts

      • 15 Powerful Step for Mastering JSON Parsing in Python: Boosting Data Manipulation and Validation
      • Introduction to Transfer Learning with Python: A Practical Guide
      • How to Check Type in Python
      • How to make web crawler in python?
      • Why was the language called “python”?
      Contact
      •   support@pythonpoint.com

      We get you the best Python Courses and Blogs aiming to provide skill.

      We Believe Skill is much more important than a Degree

      Company
      • About Us
      • Blog
      • Offers
      • Contact
      Useful Links
      • Courses
      Support
      • Need Support

      © 2020 ALL RIGHTS RESERVED​ PYTHONPOINT.NET

      PythonPoint

      • Terms of Use
      • Refund Policy
      • Privacy Policy

      Login with your site account

      Lost your password?

      Not a member yet? Register now

      Register a new account

      Are you a member? Login now

      wpDiscuz