Search code examples
pythonpdfghostscript

Python Ghostscript Repair pdf


I have pdf Files where I recieved the message from PyPDF2 "incorrect startxref pointer(1)". Now I want to repair the pdf Files with Ghostscript and Python.

I installed:

pip install ghostscript ghostscript 10.01.1

But now I´m lost. Even with the Examples of Python-ghostscript I don´t now how to start. I found the syntax

  -o repaired.pdf ^
  -sDEVICE=pdfwrite ^
  -dPDFSETTINGS=/prepress ^
   corrupted.pdf

Can someone help me?

import sys
from ghostscript import _gsprint as gs

args = [
    b"gs",  # actual value doesn't matter
    b"-o repaired.pdf",
    b"-sDEVICE=pdfwrite",
    b"-dPDFSETTINGS=/press",
    b"corrupted.pdf"
    ]

instance = gs.new_instance()
code = gs.init_with_args(instance, args)
code1 = gs.exit(instance)
if code == 0 or code == gs.e_Quit:
    code = code1
gs.delete_instance(instance)
if not (code == 0 or code == gs.e_Quit):
    sys.exit(1)

Solution

  • Assuming you have GhostScript installed and on your path, you don't need a Python wrapper for it.

    Maybe something like

    import shutil
    import subprocess
    
    
    def repair_pdf(in_file, out_file):
        gs = shutil.which("gs")
        if not gs:
            raise RuntimeError("Ghostscript not found")
        subprocess.check_call(
            [
                gs,
                "-dSAFER",
                "-dNOPAUSE",
                "-dBATCH",
                "-sDEVICE=pdfwrite",
                "-o",
                out_file,
                in_file,
            ]
        )
    
    
    repair_pdf("corrupted.pdf", "repaired.pdf")