How can I change the sequence of the pages in a PDF file?

I want to make a simple Python program that changes the order of the pages in a PDF file, which would allow me later to place two pages (A5) on one sheet (A4) and print the document as a booklet on a printer.

As an example, this is how the pages will be placed on one sheet:

As you can see this means that the normal sequence of the pages in the PDF file would need to be changed accordingly:

Original sequence = 1, 2, 3, 4 ...

New sequence= 2, 3, 4, 1 ...

I came up with the solution below, which actually works, but unfortunately it skips the last pages if they cannot be divided by four:

import pypdf

pdf = pypdf.PdfReader('test.pdf')
writer = pypdf.PdfWriter()

# Different sequences
sq_1 = range(1, len(pdf.pages), 4)
sq_2 = range(2, len(pdf.pages), 4)
sq_3 = range(3, len(pdf.pages), 4)
sq_4 = range(0, len(pdf.pages), 4)

# Iterate over pages
for a, b, c, d in zip(sq_1, sq_2, sq_3, sq_4):
    
    # Copy pages to new file
    writer.add_page(pdf.get_page(a))
    writer.add_page(pdf.get_page(b))
    writer.add_page(pdf.get_page(c))
    writer.add_page(pdf.get_page(d))

# write new PDF file
output = open("output.pdf", "wb")
writer.write(output)

# close files
pdf.close()
output.close()

Is there any way to easily solve this problem?

I guess the problem here is zip() but I cannot find any good replacement for it.

Solution

Two ways to solve this

Using `zip_longest()` instead of `zip()`

zip() stops at the shortest list, but you can use itertools.zip_longest() which will continue to the longest list, filling in None values for the other lists as it goes.

import pypdf
from itertools import zip_longest

pdf = pypdf.PdfReader('test.pdf')
writer = pypdf.PdfWriter()

# Different sequences
sq_1 = range(1, len(pdf.pages), 4)
sq_2 = range(2, len(pdf.pages), 4)
sq_3 = range(3, len(pdf.pages), 4)
sq_4 = range(0, len(pdf.pages), 4)

# Iterate over pages
for a, b, c, d in zip_longest(sq_1, sq_2, sq_3, sq_4):

    # Copy pages to new file
    if a is not None:
        writer.add_page(pdf.get_page(a))
    if b is not None:
        writer.add_page(pdf.get_page(b))
    if c is not None:
        writer.add_page(pdf.get_page(c))
    if d is not None:
        writer.add_page(pdf.get_page(d))

# write new PDF file
output = open("output.pdf", "wb")
writer.write(output)

# close files
pdf.close()
output.close()

Manually adding the remaining pages after the loop

You can save the lengths and last pages to lists and then iterate through them for less lines of code.

import pypdf

pdf = pypdf.PdfReader('test.pdf')
writer = pypdf.PdfWriter()

# Different sequences
sq_1 = range(1, len(pdf.pages), 4)
sq_2 = range(2, len(pdf.pages), 4)
sq_3 = range(3, len(pdf.pages), 4)
sq_4 = range(0, len(pdf.pages), 4)

# Iterate over pages
for a, b, c, d in zip(sq_1, sq_2, sq_3, sq_4):

    # Copy pages to new file
    writer.add_page(pdf.get_page(a))
    writer.add_page(pdf.get_page(b))
    writer.add_page(pdf.get_page(c))
    writer.add_page(pdf.get_page(d))

# Copy remaining pages
limit = len(pdf.pages) // 4

if len(sq_1) > limit:
    writer.add_page(pdf.get_page(sq_1[-1]))
if len(sq_2) > limit:
    writer.add_page(pdf.get_page(sq_2[-1]))
if len(sq_3) > limit:
    writer.add_page(pdf.get_page(sq_3[-1]))
if len(sq_4) > limit:
    writer.add_page(pdf.get_page(sq_4[-1]))

# write new PDF file
output = open("output.pdf", "wb")
writer.write(output)

# close files
pdf.close()
output.close()

How can I change the sequence of the pages in a PDF file?

Two ways to solve this

Using zip_longest() instead of zip()

Manually adding the remaining pages after the loop

Using `zip_longest()` instead of `zip()`