I'm using python 2.7 and pyPDF to get the title meta info from PDF files. Unfortunately not all of PDF have the meta info. What I want to do now is grab the first two line of text from a PDF. Using what I have now how can I modify the code to capture the first two lines with pyPDF?
from pyPdf import PdfFileWriter, PdfFileReader
import os
for fileName in os.listdir('.'):
try:
if fileName.lower()[-3:] != "pdf": continue
input1 = PdfFileReader(file(fileName, "rb"))
# print the title of document1.pdf
print fileName, input1.getDocumentInfo().title
except:
print ",",
from PyPDF2 import PdfFileWriter, PdfFileReader
import os
import StringIO
fileName = "HMM.pdf"
try:
if fileName.lower()[-3:] == "pdf":
input1 = PdfFileReader(file(fileName, "rb"))
# print the title of document1.pdf
#print fileName, input1.getDocumentInfo().title
content = input1.getPage(0).extractText()
buf = StringIO.StringIO(content)
buf.readline()
buf.readline()
except:
print ",",
My pwd contains this "HMM.pdf" file and this code is working on python 2.7 properly.