I have pdf Files where I recieved the message from PyPDF2 "incorrect startxref pointer(1)". Now I want to repair the pdf Files with Ghostscript and Python.
I installed:
pip install ghostscript ghostscript 10.01.1
But now I´m lost. Even with the Examples of Python-ghostscript I don´t now how to start. I found the syntax
-o repaired.pdf ^
-sDEVICE=pdfwrite ^
-dPDFSETTINGS=/prepress ^
corrupted.pdf
Can someone help me?
import sys
from ghostscript import _gsprint as gs
args = [
b"gs", # actual value doesn't matter
b"-o repaired.pdf",
b"-sDEVICE=pdfwrite",
b"-dPDFSETTINGS=/press",
b"corrupted.pdf"
]
instance = gs.new_instance()
code = gs.init_with_args(instance, args)
code1 = gs.exit(instance)
if code == 0 or code == gs.e_Quit:
code = code1
gs.delete_instance(instance)
if not (code == 0 or code == gs.e_Quit):
sys.exit(1)
Assuming you have GhostScript installed and on your path, you don't need a Python wrapper for it.
Maybe something like
import shutil
import subprocess
def repair_pdf(in_file, out_file):
gs = shutil.which("gs")
if not gs:
raise RuntimeError("Ghostscript not found")
subprocess.check_call(
[
gs,
"-dSAFER",
"-dNOPAUSE",
"-dBATCH",
"-sDEVICE=pdfwrite",
"-o",
out_file,
in_file,
]
)
repair_pdf("corrupted.pdf", "repaired.pdf")