Search code examples
pdfpdf-generationjpegimagemagick-convertpdflatex

Storing jpg images into a pdf file in a "lossless" way


Given a directory with several jpg files (photos), I would like to create a single pdf file with one photo per page. However, I would like the photos to be stored in the pdf file unchanged; i.e., I would like to avoid decoding and recoding. So ideally I would like to be able to extract the original jpg files (maybe minus the metadata) from the pdf file, using, e.g., a linux command line too like pdfimages.

My ideas so far:

  • imagemagick convert. However, I am confused by the compression options: If I choose 100% quality, does it mean that the jpg is internally decoded, and then encoded lossless? (Which is obviously not what I want?)
  • pdflatex. Some people claim that the graphics package includes images lossless, while other dispute that. In any case, pdflatex would be slightly more cumbersome (I would first have to find out the dimensions of the photos, then set the page size accordingly, make sure that ther are no margins, headers etc etc).

Solution

  • You could use the following small script which relies on HexaPDF (note: I'm the author of HexaPDF) to do this.

    Note: Make sure you have Ruby 2.4 installed, then run gem install hexapdf to install hexapdf.

    Here is the script:

    require 'hexapdf'
    
    doc = HexaPDF::Document.new
    
    ARGV.each do |image_file|
      image = doc.images.add(image_file)
      page = doc.pages.add
      iw = image.info.width.to_f
      ih = image.info.height.to_f                                                                                                                             
      pw = page.box(:media).width.to_f
      ph = page.box(:media).height.to_f
      rw, rh = pw / iw, ph / ih
      ratio = [rw, rh].min
      iw, ih = iw * ratio, ih * ratio
      x, y = (pw - iw) / 2, (ph - ih) / 2
      page.canvas.image(image, at: [x, y], width: iw, height: ih)
    end
    
    doc.write('images.pdf')
    

    Just supply the images as arguments on the command line, the output file will be named images.pdf. Most of the code deals with centering and scaling the images to nicely fit onto the pages.