Search code examples
pythonbatch-filepdfbatch-rename

Windows Automatic naming from info in PDF file itself


I am trying to find out a way to take scanned pdfs that are automatically named things like "397009900" to a certain string inside the PDF itself. In my case it is a drawing name that I am trying to extract from the PDF to rename the file ie "ISO-4024-4301".

Is there a way to automatically rename a PDF file with information from inside of it?

Thanks very much.


Solution

  • This can be done with python.

    import PyPDF2
    with open('path_to_file\Test doc.pdf', 'rb') as p:
        pdfReader = PyPDF2.PdfFileReader(p)
        pageObj = pdfReader.getPage(0)
        info=pageObj.extractText()
        print(info)
    

    You can specify the page number where you want to extract the information. Change page number from 0 where you want to extract.

    pageObj = pdfReader.getPage(0)
    

    The extracted texts will be stored in the variable info, then you can perform any operation to choose the required text you want to rename to.

    import os
    os.rename(r'old_file_path_and_name_with_extension',r'new_file_path_and_name_with_extension')
    

    With OS module, you can easily rename the files!