Search code examples
pythonghostscriptpdfrw

ghostscript or python : how to combine pdf of different page sizes into a pdf of same page sizes?


I searched the stackoverflow for the problem. The nearest link is:
How to set custom page size with Ghostscript
How to convert multiple, different-sized PostScript files to a single PDF?

But this could NOT solve my problem.

The question is plain simple.
How can we combine multiple pdf (with different page sizes) into a combined pdf which have all the pages of same size.

Example:
two input pdfs are:
hw1.pdf with single page of size 5.43x3.26 inch (found from adobe reader)
hw6.pdf with single page of size 5.43x6.51 inch

The pdfs can be found here:
https://github.com/bhishanpdl/Questions

The code is:

gs -sDEVICE=pdfwrite -r720 -g2347x3909 -dPDFFitPage -o homeworks.pdf hw1.pdf hw6.pdf

PROBLEM: First pdf is portrait, and second page is landscape.
QUESTION: How can we make both pages portrait ?

NOTE:
-r720 is pixels/inch.
The size -g2347x3909 is found using python script:

wd = int(np.floor(720 * 5.43))
ht = int(np.floor(720 * 3.26))    

gsize = '-g' + str(ht) + 'x' + str(wd) + ' '
# this gives:  gsize = -g4308x6066

Another Attempt

commands = 'gs -o homeworks.pdf -sDEVICE=pdfwrite -dDEVICEWIDTHPOINTS=674 ' +\
               ' -dDEVICEHEIGHTPOINTS=912 -dPDFFitPage ' +\
               'hw1.pdf hw6.pdf'
subprocess.call(commands, shell=1)

This gives first both pages portrait, but they do not have the same size.
First page is smaller is size, and second is full when I open the output in adobe reader.
In general, how can we make size of all the pages same?


Solution

  • You have tagged this question "ghostscript" but I assume by your use of subprocess.call() that you are not averse to using Python.

    The pagemerge canvas of the pdfrw Python library can do this. There are some examples of dealing with different sized pages in the examples directory and at the source of pagemerge.py. The fancy_watermark.py shows an example of dealing with different page sizes, in the context of applying watermarks.

    pdfrw can rotate, scale, or simply position source pages on the output. If you want rotation or scaling, you can look in the examples directory. (Since this is for homework, for extra credit you can control the scaling and rotation by looking at the various page sizes. :) But if all you want is the second page to be extended to be as long as the first, you could do that with this bit of code:

    from pdfrw import PdfReader, PdfWriter, PageMerge
    
    pages = PdfReader('hw1.pdf').pages + PdfReader('hw6.pdf').pages
    output = PdfWriter()
    
    rects = [[float(num) for num in page.MediaBox] for page in pages] 
    height = max(x[3] - x[1] for x in rects)
    width = max(x[2] - x[0] for x in rects)
    
    mbox = [0, 0, width, height]
    
    for page in pages:
        newpage = PageMerge()
        newpage.mbox = mbox              # Set boundaries of output page
        newpage.add(page)                # Add one old page to new page
        image = newpage[0]               # Get image of old page (first item)
        image.x = (width - image.w) / 2  # Center old page left/right
        image.y = (height - image.h)     # Move old page to top of output page
        output.addpage(newpage.render())
    
    output.write('homeworks.pdf')
    

    (Disclaimer: I am the primary pdfrw author.)