I want to automatically turn pdf files into text, and then take that output to save a file on my desktop.
Example:
-- pdf converted text: "HELLO WORLD"
-- save file on desktop on a .txt file with "HELLO WORLD" saved.
I have done:
fp = open('/Users/zain/Desktop', 'pdf_summary')
fp.write(text)
I thought this would save my file on the desktop given the input (text) which I used as the variable to house the converted text.
Full Code:
from PyPDF2 import PdfReader
reader = PdfReader("/Users/zain/Desktop/Week2_POL305_Manfieldetal.pdf")
text = ""
for page in reader.pages:
text += page.extract_text() + "\n"
print(text)
fp = open('/Users/zain/Desktop', 'pdf_summary')
fp.write(text)
fp.write(text)
PDF may consist of all sorts of things, not only text. You therefore have to explicitly extract text from a PDF - if that is what you want.
In package PyMuPDF you could do it this way:
import fitz # import pymupdf
import pathlib
doc=fitz.open("input.pdf")
text = "\n".join([page.get_text() for page in doc])
pathlib.Path("input.txt").write_bytes(text.encode()) # supports non ASCII text