Search code examples
pdfpdf-generationghostscripteps

How to keep margins when converting PDF to EPS outlines using Ghostscript?


I'm using Ghostscript to convert a PDF document into an EPS file.

My goal is to remove the textual information (while keeping the vector outlines of the text intact) in the PDF. I am doing so by converting to EPS and then converting it back PDF. (Of course, I don't expect to prevent people from running OCR to get the text.)

The command I used was:

gs -q -dNOCACHE -dNOPAUSE -dBATCH -dSAFER \
   -sDEVICE=epswrite -sOutputFile=output.eps input.pdf

But when I convert the resulting EPS back to PDF, the original margin is mostly lost, the page size shrank, and texts on even-numbered pages are cropped on the right.

Is there a way to keep the original page size and margin during the conversion?

Another tool I tried was ps2eps.

While it supports specifying a page size, it does not actually remove the textual information, so one could still select and copy text from the resulting PDF. This defeats my purpose.

Another drawback is that it only supports converting a single page, so I have to first convert my PDF to a set of single-page PS files using psselect.


Solution

  • Firstly don't use epswrite (in fact in recent versions of Ghostscript, you can't -- so you must be using an old version, upgrade!). You should be using the eps2write device instead.

    Secondly, don't convert PDF->EPS->PDF.

    Each conversion costs you accuracy. Doubly don't do this if you intend to maintain page level information (like margins). EPS files are deliberately intended to have a tight bounding box, amongst other requirements which probably make it unsuitable for your purposes.

    If you want to maintain the page level data, then convert to PostScript, not EPS, using the ps2write device.

    Note that when using the epswrite device, you are not 'removing the textual information (while keeping the vector outlines of the text intact)', but in the general case you are rendering the text to bitmaps. Ugly, and doesn't scale well!

    To do this sensibly, use a current version of Ghostscript (9.16), use the pdfwrite device (with PDF in, PDF out) and select the -dNoOutputFonts switch.

    This will do what you seem to want: it will draw the text as vectors, not text. The result will, of course, be a PDF file which is unsearchable and immune to copy/paste.