Search code examples
pythonpdfminer

Getting PDF Version using Python


I need to extract the PDF version from a PDF document. I tried PDF miner but it provides the below info only:

  1. PDF Producer
  2. Created
  3. Modified
  4. Application

Below is the code I tried:

from pdfminer.pdfparser import PDFParser
from pdfminer.pdfdocument import PDFDocument

fp = open("ibs.servlets.pdf", 'rb')
parser = PDFParser(fp)
doc = PDFDocument(parser)
parser.set_document(doc)
if len(doc.info) > 0:
   info = doc.info[0]
   print(info)

Is there any other libraries apart from pdf miner I can use?


Solution

  • The PDF version is stored as a comment in the first line of the PDF file. I couldn't find how to get this information using pdfparser, but using PyPDF2 I could retrieve this information manually:

    from PyPDF2.pdf import PdfFileReader
    doc = PdfFileReader('ibs.servlets.pdf')
    doc.stream.seek(0) # Necessary since the comment is ignored for the PDF analysis
    print(doc.stream.readline().decode())
    

    Output:

    %PDF-1.5