I want to make a simple Python program that changes the order of the pages in a PDF file, which would allow me later to place two pages (A5) on one sheet (A4) and print the document as a booklet on a printer.
As an example, this is how the pages will be placed on one sheet:
As you can see this means that the normal sequence of the pages in the PDF file would need to be changed accordingly:
Original sequence = 1, 2, 3, 4 ...
New sequence= 2, 3, 4, 1 ...
I came up with the solution below, which actually works, but unfortunately it skips the last pages if they cannot be divided by four:
import pypdf
pdf = pypdf.PdfReader('test.pdf')
writer = pypdf.PdfWriter()
# Different sequences
sq_1 = range(1, len(pdf.pages), 4)
sq_2 = range(2, len(pdf.pages), 4)
sq_3 = range(3, len(pdf.pages), 4)
sq_4 = range(0, len(pdf.pages), 4)
# Iterate over pages
for a, b, c, d in zip(sq_1, sq_2, sq_3, sq_4):
# Copy pages to new file
writer.add_page(pdf.get_page(a))
writer.add_page(pdf.get_page(b))
writer.add_page(pdf.get_page(c))
writer.add_page(pdf.get_page(d))
# write new PDF file
output = open("output.pdf", "wb")
writer.write(output)
# close files
pdf.close()
output.close()
Is there any way to easily solve this problem?
I guess the problem here is zip() but I cannot find any good replacement for it.
zip_longest()
instead of zip()
zip()
stops at the shortest list, but you can use itertools.zip_longest()
which will continue to the longest list, filling in None
values for the other lists as it goes.
import pypdf
from itertools import zip_longest
pdf = pypdf.PdfReader('test.pdf')
writer = pypdf.PdfWriter()
# Different sequences
sq_1 = range(1, len(pdf.pages), 4)
sq_2 = range(2, len(pdf.pages), 4)
sq_3 = range(3, len(pdf.pages), 4)
sq_4 = range(0, len(pdf.pages), 4)
# Iterate over pages
for a, b, c, d in zip_longest(sq_1, sq_2, sq_3, sq_4):
# Copy pages to new file
if a is not None:
writer.add_page(pdf.get_page(a))
if b is not None:
writer.add_page(pdf.get_page(b))
if c is not None:
writer.add_page(pdf.get_page(c))
if d is not None:
writer.add_page(pdf.get_page(d))
# write new PDF file
output = open("output.pdf", "wb")
writer.write(output)
# close files
pdf.close()
output.close()
You can save the lengths and last pages to lists and then iterate through them for less lines of code.
import pypdf
pdf = pypdf.PdfReader('test.pdf')
writer = pypdf.PdfWriter()
# Different sequences
sq_1 = range(1, len(pdf.pages), 4)
sq_2 = range(2, len(pdf.pages), 4)
sq_3 = range(3, len(pdf.pages), 4)
sq_4 = range(0, len(pdf.pages), 4)
# Iterate over pages
for a, b, c, d in zip(sq_1, sq_2, sq_3, sq_4):
# Copy pages to new file
writer.add_page(pdf.get_page(a))
writer.add_page(pdf.get_page(b))
writer.add_page(pdf.get_page(c))
writer.add_page(pdf.get_page(d))
# Copy remaining pages
limit = len(pdf.pages) // 4
if len(sq_1) > limit:
writer.add_page(pdf.get_page(sq_1[-1]))
if len(sq_2) > limit:
writer.add_page(pdf.get_page(sq_2[-1]))
if len(sq_3) > limit:
writer.add_page(pdf.get_page(sq_3[-1]))
if len(sq_4) > limit:
writer.add_page(pdf.get_page(sq_4[-1]))
# write new PDF file
output = open("output.pdf", "wb")
writer.write(output)
# close files
pdf.close()
output.close()