Search code examples
pythonpdfrw

pdfrw - fill pdf with python, trouble using slice for multiple pages


Hi I'm having trouble using pdfrw for python. I'm trying to fill a PDF with pdfrw and I can fill one page. The obj.pages will only accept an integer and not a slice. Currently it will only fill one page specified. When I enter page two in obj.page it fills only the second page, etc. I need four pages filled.

import pdfrw

TEMPLATE_PATH = 'temppath.pdf'
OUTPUT_PATH = 'outpath.pdf'

ANNOT_KEY = '/Annots'
ANNOT_FIELD_KEY = '/T'
ANNOT_VAL_KEY = '/V'
ANNOT_RECT_KEY = '/Rect'
SUBTYPE_KEY = '/Subtype'
WIDGET_SUBTYPE_KEY = '/Widget'

def write_fillable_pdf(input_pdf_path, output_pdf_path, data_dict):
    template_pdf = pdfrw.PdfReader(input_pdf_path)
    annotations = template_pdf.pages[:3][ANNOT_KEY]
    for annotation in annotations:
        if annotation[SUBTYPE_KEY] == WIDGET_SUBTYPE_KEY:
            if annotation[ANNOT_FIELD_KEY]:
                key = annotation[ANNOT_FIELD_KEY][1:-1]
                if key in data_dict.keys():
                    annotation.update(
                        pdfrw.PdfDict(V='{}'.format(data_dict[key]))
                    )
    pdfrw.PdfWriter().write(output_pdf_path, template_pdf)

data_dict = {}

if __name__ == '__main__':
write_fillable_pdf(TEMPLATE_PATH, OUTPUT_PATH, data_dict)

when i use a slice

annotations = template_pdf.pages[:3][ANNOT_KEY]

returns the error

TypeError: list indices must be integers or slices, not str

otherwise it will only run on one page

annotations = template_pdf.pages[0][ANNOT_KEY]

or

annotations = template_pdf.pages[1][ANNOT_KEY]

will run the indicated page

I'm having a similar issue to: How to add text to the second page in pdf with Python, Reportlab and pdfrw?

working from this article https://bostata.com/post/how_to_populate_fillable_pdfs_with_python/


Solution

  • The exception that you're seeing to the expression pages[:3][ANNOT_KEY] does not occur because of a problem taking the slice pages[:3] -- that works fine. But a slice of a list is a list, and the syntax [ANNOT_KEY] attempts to index into this new list using ANNOT_KEY, which is a string.

    But don't take my word for it; split the line:

        annotations = template_pdf.pages[:3][ANNOT_KEY]
    

    into two lines:

        foobar = template_pdf.pages[:3]
        annotations = foobar[ANNOT_KEY]
    

    and see where the error occurs.

    Anyway, as I mentioned in a comment above, you also should not use strings to index PdfDicts -- use PdfStrings, or simply access them with the correct attributes.

    I don't personally use annotations so I'm not sure exactly what you're trying to accomplish, but if annotations is always a list if given, you could do something like this:

        annotations = []
        for page in template_pdf.pages[:3]:
            annotations.extend(page.Annots or [])
    

    (The purpose of the or [] expression above is to handle the case where a page does not have /Annots -- since pdfrw will return None for non-existent dict keys (to match the semantic behavior of PDF dictionaries) you want to insure that you are not attempting to extend a list with None.)

    You may also want to deduplicate the list, if it is possible for multiple pages to share any of the annotations.

    Disclaimer: I am the primary pdfrw author.