Trying to read a pdf file thats name may change, however I have a preliminary script that contains the file name. So I successfully save that file name to a variable however when I try to open a file using that variable I get an error: "ValueError: embedded null byte"
I have tried a couple solutions for example I attempted using this solution, However I receive the same error. I have identified a work around using glob, since I can predict the file name (I know there will always be one PDF) however if possible I want to try to avoid using this solution in case in the future we have multiple PDFs to handle.
This is what I have:
pdfFileName = pdfFileName[132:220] # File path is correct, I have confirmed
objectPDF = open(pdfFileName,'rb')
pdfReader = PyPDF2.PdfFileReader(objectPDF)
pageObj = pdfReader.getPage(0)
print(pageObj.extractText())
My Error is:
Traceback (most recent call last):
File "verify.py", line 48, in <module>
objectPDF = open(pdfFileName,'rb')
ValueError: embedded null byte
What I would like is for the text of the pdf to be output to the console. The error is certainly with the way I'm reading the file, if I hard type the file path in it works as expected, but not when a variable is used with the exact same value as the string.
Place this: pdfFileName = pdfFileName.replace('\0','')
before this: objectPDF = open(pdfFileName,'rb')
What that code does is that it removes all "nulls` from the string, which allows everything to run properly.