Search code examples
python-3.xinsertioncursor-positionpython-docx

python-docx insertion point


I am not sure if I've been missing anything obvious, but I have not found anything documented about how one would go to insert Word elements (tables, for example) at some specific place in a document?

I am loading an existing MS Word .docx document by using:

my_document = Document('some/path/to/my/document.docx')

My use case would be to get the 'position' of a bookmark or section in the document and then proceed to insert tables below that point.

I'm thinking about an API that would allow me to do something along those lines:

insertion_point = my_document.bookmarks['bookmark_name'].position
my_document.add_table(rows=10, cols=3, position=insertion_point+1)

I saw that there are plans to implement something akin to the 'range' object of the MS Word API, this would effectively solve that problem. In the meantime, is there a way to instruct the document object methods where to insert the new elements?

Maybe I can glue some lxml code to find a node and pass that to these python-docx methods? Any help on this subject would be much appreciated! Thanks.


Solution

  • I remembered an old adage, "use the source, Luke!", and could figure it out. A post from python-docx owner on its git project page also gave me a hint: https://github.com/python-openxml/python-docx/issues/7.

    The full XML document model can be accessed by using the its _document_part._element property. It behaves exactly like an lxml etree element. From there, everything is possible.

    To solve my specific insertion point problem, I created a temp docx.Document object which I used to store my generated content.

    import docx
    from docx.oxml.shared import qn
    tmp_doc = docx.Document()
    
    # Generate content in tmp_doc document
    tmp_doc.add_heading('New heading', 1)
    # more content generation using docx API.
    # ...
    
    # Reference the tmp_doc XML content
    tmp_doc_body = tmp_doc._document_part._element.body
    # You could pretty print it by using:
    #print(docx.oxml.xmlchemy.serialize_for_reading(tmp_doc_body))
    

    I then loaded my docx template (containing a bookmark named 'insertion_point') into a second docx.Document object.

    doc = docx.Document('/some/path/example.docx')
    doc_body = doc._document_part._element.body
    #print(docx.oxml.xmlchemy.serialize_for_reading(doc_body))
    

    The next step is parsing the doc XML to find the index of the insertion point. I defined a small function for the task at hand, which returns a named bookmark parent paragraph element:

    def get_bookmark_par_element(document, bookmark_name):
    """
    Return the named bookmark parent paragraph element. If no matching
    bookmark is found, the result is '1'. If an error is encountered, '2'
    is returned.
    """
    doc_element = document._document_part._element
    bookmarks_list = doc_element.findall('.//' + qn('w:bookmarkStart'))
    for bookmark in bookmarks_list:
        name = bookmark.get(qn('w:name'))
        if name == bookmark_name:
            par = bookmark.getparent()
            if not isinstance(par, docx.oxml.CT_P): 
                return 2
            else:
                return par
    return 1
    

    The newly defined function was used toget the bookmark 'insertion_point' parent paragraph. Error control is left to the reader.

    bookmark_par = get_bookmark_par_element(doc, 'insertion_point')
    

    We can now use bookmark_par's etree index to insert our tmp_doc generated content at the right place:

    bookmark_par_parent = bookmark_par.getparent()
    index = bookmark_par_parent.index(bookmark_par) + 1
    for child in tmp_doc_body:
        bookmark_par_parent.insert(index, child)
        index = index + 1
    bookmark_par_parent.remove(bookmark_par)
    

    The document is now finalized, the generated content having been inserted at the bookmark location of an existing Word document.

    # Save result
    # print(docx.oxml.xmlchemy.serialize_for_reading(doc_body))
    doc.save('/some/path/generated_doc.docx')
    

    I hope this can help someone, as the documentation regarding this is still yet to be written.