Search code examples
pythonpdffile-iopypdf

Merge PDF files


Is it possible, using Python, to merge separate PDF files?

Assuming so, I need to extend this a little further. I am hoping to loop through folders in a directory and repeat this procedure.

And I may be pushing my luck, but is it possible to exclude a page that is contained in each of the PDFs (my report generation always creates an extra blank page).


Solution

  • Use Pypdf or its successor PyPDF2:

    A Pure-Python library built as a PDF toolkit. It is capable of:

    • splitting documents page by page,
    • merging documents page by page,

    (and much more)

    Here's a sample program that works with both versions.

    #!/usr/bin/env python
    import sys
    try:
        from PyPDF2 import PdfReader, PdfWriter
    except ImportError:
        from pyPdf import PdfFileReader, PdfFileWriter
    
    def pdf_cat(input_files, output_stream):
        input_streams = []
        try:
            # First open all the files, then produce the output file, and
            # finally close the input files. This is necessary because
            # the data isn't read from the input files until the write
            # operation. Thanks to
            # https://stackoverflow.com/questions/6773631/problem-with-closing-python-pypdf-writing-getting-a-valueerror-i-o-operation/6773733#6773733
            for input_file in input_files:
                input_streams.append(open(input_file, 'rb'))
            writer = PdfWriter()
            for reader in map(PdfReader, input_streams):
                for n in range(len(reader.pages)):
                    writer.add_page(reader.pages[n])
            writer.write(output_stream)
        finally:
            for f in input_streams:
                f.close()
            output_stream.close()
    
    if __name__ == '__main__':
        if sys.platform == "win32":
            import os, msvcrt
            msvcrt.setmode(sys.stdout.fileno(), os.O_BINARY)
        pdf_cat(sys.argv[1:], sys.stdout)