I am trying to find out a way to take scanned pdfs that are automatically named things like "397009900" to a certain string inside the PDF itself. In my case it is a drawing name that I am trying to extract from the PDF to rename the file ie "ISO-4024-4301".
Is there a way to automatically rename a PDF file with information from inside of it?
Thanks very much.
This can be done with python.
import PyPDF2
with open('path_to_file\Test doc.pdf', 'rb') as p:
pdfReader = PyPDF2.PdfFileReader(p)
pageObj = pdfReader.getPage(0)
info=pageObj.extractText()
print(info)
You can specify the page number where you want to extract the information. Change page number from 0 where you want to extract.
pageObj = pdfReader.getPage(0)
The extracted texts will be stored in the variable info
, then you can perform any operation to choose the required text you want to rename to.
import os
os.rename(r'old_file_path_and_name_with_extension',r'new_file_path_and_name_with_extension')
With OS module, you can easily rename the files!