Search code examples
pdfpdf-generationpostscript

How does PS/PDF store and compress bitmaps?


I am experimenting with a system to scan letters and convert the scanned bitmaps to PDF with the goal to have a high resolution and a small PDF file size.

I am prototyping with scanner, GIMP for bitmap manipulation and ImageMagick for bitmap-to-PDF conversion.

My process looks as follows:

  • Scan in 3x8bit color, 600 DPI, LZW-compressed true-color TIFF file size is around 8 Mb.

  • Use GIMP to convert bitmap to indexed image with a typical color table of 4 to 8 colors. That makes the image better compressible.

  • Use ImageMagick to convert the LZW-compressed indexed TIFF file PDF, with around 500K per page.

Now in order to make the image even better compressible, I could make the bitmap more compression-friendly. Before experimenting here, I would like to know how PS/PDF stores bitmaps.

Are bitmaps in PS/PDF run-lenght-encoded? Then I woud gain compression by removing single pixles form bitmap rows.

Do you have ideas for further optimizing here?

Do you know references to bitmap storage format in PS/PDF?


Solution

  • PDF supports many types of image compression, see: http://en.wikipedia.org/wiki/Pdf#Raster_images

    I think you can specify which one to use with the imagemagick -compress option: http://www.imagemagick.org/script/command-line-options.php#compress