Search code examples
pythonpypdf

How can I change the sequence of the pages in a PDF file?


I want to make a simple Python program that changes the order of the pages in a PDF file, which would allow me later to place two pages (A5) on one sheet (A4) and print the document as a booklet on a printer.

As an example, this is how the pages will be placed on one sheet:

enter image description here

As you can see this means that the normal sequence of the pages in the PDF file would need to be changed accordingly:

Original sequence = 1, 2, 3, 4 ...

New sequence= 2, 3, 4, 1 ...

I came up with the solution below, which actually works, but unfortunately it skips the last pages if they cannot be divided by four:

import pypdf

pdf = pypdf.PdfReader('test.pdf')
writer = pypdf.PdfWriter()

# Different sequences
sq_1 = range(1, len(pdf.pages), 4)
sq_2 = range(2, len(pdf.pages), 4)
sq_3 = range(3, len(pdf.pages), 4)
sq_4 = range(0, len(pdf.pages), 4)

# Iterate over pages
for a, b, c, d in zip(sq_1, sq_2, sq_3, sq_4):
    
    # Copy pages to new file
    writer.add_page(pdf.get_page(a))
    writer.add_page(pdf.get_page(b))
    writer.add_page(pdf.get_page(c))
    writer.add_page(pdf.get_page(d))

# write new PDF file
output = open("output.pdf", "wb")
writer.write(output)

# close files
pdf.close()
output.close()

Is there any way to easily solve this problem?

I guess the problem here is zip() but I cannot find any good replacement for it.


Solution

  • Two ways to solve this

    Using zip_longest() instead of zip()

    zip() stops at the shortest list, but you can use itertools.zip_longest() which will continue to the longest list, filling in None values for the other lists as it goes.

    import pypdf
    from itertools import zip_longest
    
    pdf = pypdf.PdfReader('test.pdf')
    writer = pypdf.PdfWriter()
    
    # Different sequences
    sq_1 = range(1, len(pdf.pages), 4)
    sq_2 = range(2, len(pdf.pages), 4)
    sq_3 = range(3, len(pdf.pages), 4)
    sq_4 = range(0, len(pdf.pages), 4)
    
    # Iterate over pages
    for a, b, c, d in zip_longest(sq_1, sq_2, sq_3, sq_4):
    
        # Copy pages to new file
        if a is not None:
            writer.add_page(pdf.get_page(a))
        if b is not None:
            writer.add_page(pdf.get_page(b))
        if c is not None:
            writer.add_page(pdf.get_page(c))
        if d is not None:
            writer.add_page(pdf.get_page(d))
    
    # write new PDF file
    output = open("output.pdf", "wb")
    writer.write(output)
    
    # close files
    pdf.close()
    output.close()
    
    Manually adding the remaining pages after the loop

    You can save the lengths and last pages to lists and then iterate through them for less lines of code.

    import pypdf
    
    pdf = pypdf.PdfReader('test.pdf')
    writer = pypdf.PdfWriter()
    
    # Different sequences
    sq_1 = range(1, len(pdf.pages), 4)
    sq_2 = range(2, len(pdf.pages), 4)
    sq_3 = range(3, len(pdf.pages), 4)
    sq_4 = range(0, len(pdf.pages), 4)
    
    # Iterate over pages
    for a, b, c, d in zip(sq_1, sq_2, sq_3, sq_4):
    
        # Copy pages to new file
        writer.add_page(pdf.get_page(a))
        writer.add_page(pdf.get_page(b))
        writer.add_page(pdf.get_page(c))
        writer.add_page(pdf.get_page(d))
    
    # Copy remaining pages
    limit = len(pdf.pages) // 4
    
    if len(sq_1) > limit:
        writer.add_page(pdf.get_page(sq_1[-1]))
    if len(sq_2) > limit:
        writer.add_page(pdf.get_page(sq_2[-1]))
    if len(sq_3) > limit:
        writer.add_page(pdf.get_page(sq_3[-1]))
    if len(sq_4) > limit:
        writer.add_page(pdf.get_page(sq_4[-1]))
    
    # write new PDF file
    output = open("output.pdf", "wb")
    writer.write(output)
    
    # close files
    pdf.close()
    output.close()