Search code examples
pythonpython-3.xpdf-generationpypdf

PyPDF2 change field value without dictionary


I am quite new to PyPDF2 and I am mostly using snippets of code I have found on the net. What I do is simply filling PDF forms created with Adobe Acrobat XI Pro. While it works perfectly with text fields, I am having trouble setting values of dropdown lists.

I was able to determine that what PyPDF2 sees is:

{'/FT': '/Ch', '/T': DocumentType', '/Ff': 4325378, '/V': 'D', '/DV': 'W'}

In case of text fields, what it shows is:

{'/FT': '/Tx', '/T': 'SupervisorName', '/Ff': 29360130}

But I haven't found a similar method for updating values of those. How can I directly manipulate/update the value of /V here?

The code handling my PDFs is as follows:

def set_need_appearances_writer(writer):
    # See 12.7.2 and 7.7.2 for more information:
    # http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf
    try:
        catalog = writer._root_object
        # get the AcroForm tree and add "/NeedAppearances attribute
        if "/AcroForm" not in catalog:
            writer._root_object.update({
                NameObject("/AcroForm"): IndirectObject(len(writer._objects), 0, writer)})

        need_appearances = NameObject("/NeedAppearances")
        writer._root_object["/AcroForm"][need_appearances] = BooleanObject(True)
        return writer

    except Exception as e:
        print('set_need_appearances_writer() catch : ', repr(e))
        return writer


def pdf_handling(f_template_file, f_output_file, f_field_dict):
    inputStream = open(f_template_file, "rb")
    pdf_reader = PdfFileReader(inputStream, strict=False)
    if "/AcroForm" in pdf_reader.trailer["/Root"]:
        pdf_reader.trailer["/Root"]["/AcroForm"].update(
            {NameObject("/NeedAppearances"): BooleanObject(True)})

    pdf_writer = PdfFileWriter()
    set_need_appearances_writer(pdf_writer)
    if "/AcroForm" in pdf_writer._root_object:
        pdf_writer._root_object["/AcroForm"].update(
            {NameObject("/NeedAppearances"): BooleanObject(True)})

    pdf_writer.addPage(pdf_reader.getPage(0))
    pdf_writer.updatePageFormFieldValues(pdf_writer.getPage(0), f_field_dict)

    outputStream = open(f_output_file, "wb")
    pdf_writer.write(outputStream)

    inputStream.close()
    outputStream.close()

And calling it with values:

field_dict = {
    'IssueDay': DDay,
    'IssueMonth': MMonth,
    'IssueYear': YYear,
    'RecruitmentNumber': row['RecruitmentID'].zfill(5),
    'DocumentType': 'D',
}

template_file = os.path.join(template_path, 'document_template.pdf')
output_file = os.path.join(person_path, 'document_output.pdf')

pdf_handling(template_file, output_file, field_dict)

Solution

  • I tried using PyPDF2 to manipulate drop down lists, but couldn't find a solution to this problem. I found a workaround, which is basically to turn the drop down list into text field and then you can fill in whatever text you want just like any other text field.

    To do this you need to locate the object, and update the '/FT' field from '/Ch' to '/Tx'. If you look at source code of updatePageFormFieldValues() (https://github.com/mstamy2/PyPDF2/blob/master/PyPDF2/pdf.py#L354) you will find this to be pretty straightforward. Once you locate the object, you can do:

    obj.update({NameObject('/FT'): NameObject('/Tx')})
    

    You can save the modified pdf file, and fill the file later, or you can first update the object type to text field, then directly fill the modified field.