Search code examples
pdfghostscripttiffimagemagick-convert

Quality of tiff output Imagemagick vs. Ghostscript


I'm currently working on a Google tesseract ocr workflow. There are two options for generating tif's from PDF:

  1. Ghostscript:

    gswin64c.exe -r300x300 -dBATCH -dNOPAUSE -sDEVICE=tiff24nc -sOutputFile=thetif.tif -sCompression=lzw thepdf.pdf -c quit -q

  2. Imagemagick - convert:

    convert -background white -alpha off -density 300 thepdf.pdf -depth 8 -compress zip thetif.tif

For an (arbitrary) example file, the extracted tif is for gswin64c about five times as large as the result of convert. Also the text is nevertheless much smoother and higher quality with convert (!) then with gswin64c. So I would prefer to use convert, but it takes unfortunately about 4 times the time of gswin64c to extract e.g. 30 pages from a multipage pdf (170 sec vs. 40 sec).

Is there any chance to improve the quality of gswin64c (without extremely enlarge the output files) or to speed up convert?


Solution

  • To me this appears to be the usual trade off of speed versus quality. You like the convert quality, but its too slow, you like Ghostscript's speed but you feel the quality is lower.

    Surely that would suggest that you can't have both ?

    Anyway do you realise that ImageMagick convert calls Ghostscript to render the PDF file ? So whichever route you use, you are using Ghostscript.

    It is (of course) entirely possible that convert is post=-processing the image, but I would suspect it is not. If you look into how convert works you can probably find out what command line its feeding to Ghostscript and use that.

    It also looks like convert is using a different compression filter (Flate instead of LZW), and may be specifying anti-aliasing. You can get anti-aliasing either by using TextAlphaBits and GraphcisAlphaBits or the tiffscaled devices.

    Of course, using anti-aliasing will result in smoother text (if you like blurred text) but it will take longer.