Search code examples
pythonpandaspdfpypdf

PyPDF2 give me blank pages in merged PDF


I have earlier come up with this question in here: pypdf2-merging-pdf-pages-issue

Where I have now come a long way and can now create my PDF files from an Excel document via Pandas into PyPDF2.

As well as where I now have the number of pages that must be per. PDF. However, my problem now is that my merged PDF files are now blank.

If I do a debug, then I can see that in my second loop, which contains the variable "paths" the right paths to my physical PDF files. But that when they then come in through:

            with path.open('rb') as pdf:
                pdf_writer.append(pdf)

Then suddenly an extra "" enters the paths so that a path can be named c: \ users \ .... then suddenly it is called c: \ users \ ...

Do not know if this is what prevents the files from being opened and read correctly, and then merged into one PDF file.

Hope some can guide me as python for me is self taught. Or in some other way can explain to me why I get created some merged PDF files that are suddenly blank on 3 pages.

My code is:

import datetime             #Handle date
import pandas as pd         #Handle data from Excel Sheet (Data analysis)
import PyPDF2 as pdf2       #Handle PDF read and merging
from pathlib import Path    #Handle path

#Skip ERROR-message: Xref table not zero-indexed. ID numbers for objects will be corrected.
#import sys
#if not sys.warnoptions:
#    import warnings
#    warnings.simplefilter("ignore")

PDF_PATH = Path('C:/Users/TH/PDF/')
EXCEL_FILENAME = 'Resources/liste.xlsx'


def main():
    today = datetime.date.today()  # The date now
    next_week = today.isocalendar()[1] + 1  # 0=Year, 1=week
    resources = pd.read_excel(EXCEL_FILENAME, sheet_name='Ark1')

    for row in resources.itertuples():
        year = row.Aargang
        paths = [
            (PDF_PATH / row.Oevelse1).with_suffix('.pdf'),
            (PDF_PATH / row.Oevelse2).with_suffix('.pdf'),
            (PDF_PATH / row.Oevelse3).with_suffix('.pdf'),
        ]
        pdf_writer = pdf2.PdfFileMerger()
        for path in paths:
            with path.open('rb') as pdf:
                pdf_writer.append(pdf)
        with open(f'Uge {next_week} - {year} Merged_doc.pdf', 'wb') as output:
            pdf_writer.write(output)


if __name__ == '__main__':
    main()

Solution

  • @anon01 Thx

    And Thx/credit to Sirius3.

    It's something about the PyPDF2, how to use it and some bugs with it. So after edit the code to this it work.

    import datetime                     #Handle date
    import pandas as pd                 #Handle data from Excel Sheet (Data analysis)
    from PyPDF2 import PdfFileMerger    #Handle PDF read and merging
    from pathlib import Path            #Handle path
    
    #Skip ERROR-message: Xref table not zero-indexed. ID numbers for objects will be corrected.
    #import sys
    #if not sys.warnoptions:
    #    import warnings
    #    warnings.simplefilter("ignore")
    
    PDF_PATH = Path('C:/Users/TH/PDF')
    EXCEL_FILENAME = 'Resources/liste.xlsx'
    
    
    def main():
        today = datetime.date.today()  # The date now
        next_week = today.isocalendar()[1] + 1  # 0=Year, 1=week
        resources = pd.read_excel(EXCEL_FILENAME, sheet_name='Ark1')
    
        for row in resources.itertuples():
            year = row.Aargang
            paths = [
                (PDF_PATH / row.Oevelse1).with_suffix('.pdf'),
                (PDF_PATH / row.Oevelse2).with_suffix('.pdf'),
                (PDF_PATH / row.Oevelse3).with_suffix('.pdf'),
            ]
            pdf_merger = PdfFileMerger()
            for path in paths:
                pdf_merger.append(str(path))
            with open(f'Uge {next_week} - {year} Merged_doc.pdf', 'wb') as output:
                pdf_merger.write(output)
            pdf_merger.close()
    
    
    if __name__ == '__main__':
        main()