Search code examples
pythonmergepypdf

Merge 2 PDF's in 16 page segments


I have 2 PDF's resultant from splitting a 2-up document composed by 32 pages signatures. Meaning one PDF has pages 1-16, 33-48, 65-80.... and the other has pages 17-32, 49-64, 81-96....

How can I merge both, iterating through 16-page segments of each, using Python? To get a final composed PDF with 1-16, 17-32, 33-48, 49-64.....

I can iterate them page by page and I can combine one full PDF after the other, etc. But can't seem to get the correct way merging by segments.

The first operations are done with external software (Xerox Freeflow Core) and I get to a point where I have 4 files with the 16-page sequences divided in even/odd pages, I join them iterating with:

import itertools as itt
import sys
import PyPDF2 as PDF
def main():
fbase = sys.argv[1]

pdf_out = PDF.PdfFileWriter()

with open(fbase + "_odd.pdf", 'rb') as f_odd:
    with open(fbase + "_even.pdf", 'rb')  as f_even:
        pdf_odd = PDF.PdfFileReader(f_odd)
        pdf_even = PDF.PdfFileReader(f_even)

        for p in itt.chain.from_iterable(
            itt.zip_longest(
                pdf_odd.pages,
                (pdf_even.pages),
            )
        ):
            if p:
                pdf_out.addPage(p)

        with open(fbase + ".pdf", 'wb') as f_out:
            pdf_out.write(f_out)

return 0
if __name__ == "__main__":

if len(sys.argv) != 2:
    print("Wrong number of arguments!")
    sys.exit(1)

sys.exit(main())

Afterwards I get the 2 files mentioned above. The code above would work for me if I could iterate through 16p segments instead of page by page.

Any clues, please?

Thanks


Solution

  • Got it! Just in case anyone needs something similar, here's what worked for me:

    import sys
    
    import PyPDF2 as PDF
    
    
    def main():
    fbase = sys.argv[1]
    
    all_pages = []
    with open(fbase + "_odd.pdf", 'rb') as f_odd:
        with open(fbase + "_even.pdf", 'rb')  as f_even:
            pdf_odd = PDF.PdfFileReader(f_odd)
            pdf_even = PDF.PdfFileReader(f_even)
            size_odd = len(pdf_odd.pages)
            size_even = len(pdf_even.pages)
            slice_idx = list(range(0,size_odd,16))
            zip_pdfs = list(zip(pdf_odd.pages, pdf_even.pages))
            for slice16_odd, slice16_even in [(pdf_odd.pages[el:el+16],
                                               pdf_even.pages[el:el+16])
                                              for el in slice_idx]:
                all_pages.extend(slice16_odd)
                all_pages.extend(slice16_even)
            if size_odd > slice_idx[-1]:
                all_pages.extend(slice16_odd[slice_idx[-1]:])
            if size_even > slice_idx[-1]:
                all_pages.extend(slice16_even[slice_idx[-1]:])
    
    
            if any(all_pages):
                pdf_out = PDF.PdfFileWriter()
                for page in all_pages:
                    pdf_out.addPage(page)
                with open(fbase + ".pdf", 'wb') as f_out:
                    pdf_out.write(f_out)
    
    return 0
    if __name__ == "__main__":
    
    if len(sys.argv) != 2:
        print("Wrong number of arguments!")
        sys.exit(1)
    
    sys.exit(main())
    

    Thanks anyway...

    BR