Search code examples
pdfimagemagickghostscriptpoppler

Remove / Delete all images from a PDF using Ghostscript or ImageMagick


I want to delete / remove all the images in a PDF leaving only the text / font in the PDF with whatever command Line tool possible.

I tried using -dGraphicsAlphaBits=1 in a Ghostscript command but the images are present but like a big pixel.


Solution

  • No, AFAIK, it's not possible to remove all images in a PDF with a commandline tool.

    What's the purpose of your request anyway? Save on filesize? Remove information contained in images? Or ...?

    Workaround

    Whatever you aim at, here is a command that will downsample all images to a resolution of 2 ppi (Update: 1 ppi doesn't work). Which achieves two goals at once:

    • reduce filesize
    • make all images basically un-comprehendable

    Here's how to do it selectively, for only the images on page 33 of original.pdf:

    gs                               \
      -o images-uncomprehendable.pdf \
      -sDEVICE=pdfwrite              \
      -dDownsampleColorImages=true   \
      -dDownsampleGrayImages=true    \
      -dDownsampleMonoImages=true    \
      -dColorImageResolution=2       \
      -dGrayImageResolution=2        \
      -dMonoImageResolution=2        \
      -dFirstPage=33                 \
      -dLastPage=33                  \
       original.pdf
    

    If you want to do it for all images on all pages, just skip the -dFirstPage and -dLastPage parameters.

    If you want to remove all color information from images, convert them to Grayscale in the same command (search other answers on Stackoverflow where details for this are discussed).


    Update: Originally, I had proposed to use a resolution of 1 PPI. It seems this doesn't work with Ghostscript. I now tested with 2 PPI. This works.


    Update 2: See also the following (new) question with the answer:

    It provides some sample PostScript code which completely removes all (raster) images from the PDF, leaving the rest of the page layout unchanged.

    It also reflects the expanded new capabilities of Ghostscript which can now selectively remove either all text, or all raster images, or all vector objects from a PDF, or any combination of these 3 types.