Search code examples
xmlpython-2.7floatingpython-docx

Editing floating textbox using python-docx or editing document.xml?


I'm editing a docx template that contains a bunch of floating images and textboxes. I'm trying to edit these text boxes through python-docx or directly editing the document.xml. However, it seems the current iteration of python-docx only allows editing of inline "pictures"(in this case textbox). The end goal is to either edit the textbox using python-docx or edit it by accessing and editing the document.xml (ideally without unzipping and then zipping back up)

So far, I have attempted to use python-docx to edit these but from my research, the textbox cannot be edited using python-docx.

I have also tried just editing the document.xml separately which I was successfully able to do however when I attempted to zip back up the directories and change the extension back to a docx I was unable to open it.

import lxml.etree as ET

xmlfile = r"path\document.xml"

with open(xmlfile) as f:
  tree = ET.parse(f)
  root = tree.getroot()


  for elem in root.getiterator():
    try:
      elem.text = elem.text.replace('current id in document', 'new ID in document')
    except AttributeError:
      pass

tree.write(r"path\documentedit.xml", xml_declaration=True, method='xml')

This was my first unzipping the docx without using python. Then use python to edit the XML. Then I would zip it up without using python just to see if I could get it to work but I was unable to open the document.


Solution

  • There are a lot of details involved in getting an Open Packaging Convention (OPC) package opened up and back in the form in needs to be for Word to open it.

    You can use python-docx to access the XML directly, while still leaving the details of proper saving (or "re-packaging") to python-docx.

    If you're using lxml, you can get the XML for the document part as an lxml.etree._Element object using document._element. Whatever changes you make to that XML tree will be saved when you call document.save().

    The other alternative is to access the XML text directly with document.part.blob. A changed version can be assigned (as a Python 2 str/ Python 3 bytes) to document.part._blob (note leading underscore) and will be saved when you call document.save().

    This avoids you having to deal with all the intricacies of packaging.