Search code examples
pythonpython-3.xms-wordpython-docx

Copy entire word document including tables to another using Python


I need to copy the entire contents of a template to a new document. The problem is that tables just cannot be copied. Currently, my code takes care of copying styles like bold and italic.

def get_para_data(output_doc_name, paragraph):
    output_para = output_doc_name.add_paragraph()
    for run in paragraph.runs:
        output_run = output_para.add_run(run.text)
        # Run's bold data
        output_run.bold = run.bold
        # Run's italic data
        output_run.italic = run.italic
        # Run's underline data
        output_run.underline = run.underline
        # Run's color data
        output_run.font.color.rgb = run.font.color.rgb
        # Run's font data
        output_run.style.name = run.style.name
    # Paragraph's alignment data
    output_para.paragraph_format.alignment = 
paragraph.paragraph_format.alignment
input_doc=Document('templatemain.docx')
output_doc=Document()
for para in input_doc.paragraphs:
    get_para_data(output_doc, para)
output_doc.save('OutputDoc.docx')

Most of the help I've found for copying tables is to append them. But I am copying a template into a blank document so that doesn't help me at all.


Solution

  • You are only iterating over the .paragraphs attribute of the document. Tables are listed separately, via the .tables attribute.

    You'd need to loop over all the child elements of the document body together, in document order, or otherwise you end up with all the paragraphs and tables bunched together. The python-docx library doesn't offer this functionality directly, you'd need to create your own iterator.

    For example, a simplified version would be:

    from docx.oxml.text.paragraph import CT_P
    from docx.oxml.table import CT_Tbl
    from docx.table import Table
    from docx.text.paragraph import Paragraph
    
    
    # select only paragraphs or table nodes
    for child in input_doc.element.body.xpath('w:p | w:tbl'):
        if isinstance(child, CT_P):
            paragraph = Paragraph(child, input_doc)
            get_para_data(output_doc, paragraph)
        elif isinstance(child, CT_Tbl):
            table = Table(child, input_doc)
            # do something with the table
    

    Tables can only be contained in the document body, in table cells (so nested inside other tables), in headers and footers, footnotes, and tracked changes, but not inside paragraphs.