I am using python-docx to programmatically insert data into a new document. When opening the new file, I get the following error message.
Word found unreadable content in document_name. Do you want to recover the contents of this document? If you trust the source of this document, click Yes.
Here is the process that my code is going through to get to this point:
What I have figured out so far:
After quite a bit of googling and finding some similar answers that haven't resolved the issue, I feel like this might be relevant. The tables in the findings document contain a large number of merged cells. It is only one table, not nested tables as I initially thought they were.
Heading is 2 rows deep with 4 merged cells on the left for the finding title and then on the right are two columns with headings and relevant data below. Then the body of the table is a mixture of merged cells per row. Some rows will have all cells merged, others with have 2 cells merged out of 3.
Here is the code I am using to snag the table from the findings document:
for table in findings_templates.tables:
row = table.rows[0]
for cell in row.cells:
if title.lower() in cell.text.lower():
severity = get_severity_from_template(table)
for item in severity_array:
if severity in item[1]:
anchor = item[0]
# snip
# Insert some data into table here
# snip
addTableAfterParagraph(report_document, table, title)
return True
Since the errors occur with our without modification, ill leave out the modification code. Here is the code that inserts the table into the template document:
def addTableAfterParagraph(report_document, table, title):
for para in report_document.paragraphs:
if para.text == title:
p = para._p
p.addnext(table._tbl)
Additionally, I added some print lines for table._tbl.xml and I don't see much of a difference between the source table and the one inserted into the document except for the first line has a few differing xmlns tags.
I'd love some troubleshooting tips, or any suggestions. Let me know if any more information is needed. Thanks in advance!
UPDATE: It's the hyperlinks in the source table that are causing the issue. I'm marking this solved for now and may open another more specific question if I can't figure it out.
I ended up reading data from the source document tables, then creating my own tables programmatically, and inserting that data back in along with performing any transforms, such as creating hyperlinks, styles, etc.
It was painful, but ultimately solved the issue and provides flexibility in the future.