Search code examples
pythonghostscript

How do you use Python Ghostscript's high-level interface to convert a .pdf file into multiple .png files?


I am trying to convert a .pdf file into several .png files using Ghostscript in Python. The other answers on here were pretty old hence this new thread.

The following code was given as an example on pypi.org of the 'high level' interface, and I am trying to model my code after the example code below.

import sys
import locale
import ghostscript

args = [
    "ps2pdf", # actual value doesn't matter
    "-dNOPAUSE", "-dBATCH", "-dSAFER",
    "-sDEVICE=pdfwrite",
    "-sOutputFile=" + sys.argv[1],
    "-c", ".setpdfwrite",
    "-f",  sys.argv[2]
    ]

# arguments have to be bytes, encode them
encoding = locale.getpreferredencoding()
args = [a.encode(encoding) for a in args]

ghostscript.Ghostscript(*args)

Can someone explain what this code is doing? And can it be used somehow to convert a .pdf into .png files?

I am new to this and am truly confused. Thanks so much!


Solution

  • That's calling Ghostscript, obviously. From the arguments it's not spawning a process, it's linked (either dynamically or statically) to the Ghostscript library.

    The args are Ghostscript arguments. These are documented in the Ghostscript documentation, you can find it online here. Because it mimics the command line interface, where the first argument is the calling program, the first argument here is meaningless and can be anything you want (as the comment says).

    The next three arguments turn on SAFER (which prevents some potentially dangerous operations and is, now, the default anyway), sets NOPAUSE so the entire input is processed without pausing between pages, and BATCH so that on completion Ghostscript exits instead of returning to the interactive prompt.

    Then it selects a device. In Ghostscript (due to the PostScript language) devices are what actually output stuff. In this case the device selected is the pdfwrite device, which outputs PDF.

    Then there's the OutputFile, you can probably guess that this is the name (and path) of the file where the output is to be written.

    The next 3 arguments; -c .setpdfwrite -f are, frankly archaic and pointless. They were once recommended when using the pdfwrite device (and only the pdfwrite device) but they have no useful effect these days.

    The very last argument is, of course, the input file.

    Certainly you can use Ghostscript to render PDF files to PNG. You want to use one of the PNG devices, there are several depending on what colour depth you want to support. Unless you have some stranger requirement, just use png16m. If your input file contains more than one page you'll want to set the OutputFile to use %d so that it writes one file per page.

    More details on all of this can, of course, be found in the documentation.