I have a .docx file with tables and I am trying to replace each of them with some string. I am using the following code:
import pandas as pd
import re
from docx import Document
docx_file_path = "C:/Users/llama/test.docx"
document = Document(docx_file_path)
tables = document.tables
if not tables:
print("No tables found in the document.")
else:
for i, table in enumerate(tables, start=1):
df = pd.DataFrame()
for row_index, row in enumerate(table.rows):
row_data = [cell.text.strip() for cell in row.cells]
df = pd.concat([df, pd.DataFrame([row_data])], ignore_index=True)
contains_specified_words = any(
any(word in cell for cell in df.values.flatten())
for word in ['TEST33']
)
if contains_specified_words:
duplicate_columns_indices = df[df[0] == df[1]].index
df.loc[duplicate_columns_indices, 0] = ""
str1 = ""
for row_index, row in df.iterrows():
str1 += "\t".join(row.astype(str)) + "\n"
table_parent = table._element.getparent()
table_index = table_parent.index(table._element)
table_parent.remove(table._element)
document.add_paragraph(str1)
modified_docx_path = f"C:/Users/llama/test2.docx"
document.save(modified_docx_path)
print(f"Modified document saved as {modified_docx_path}\n")
else:
print(f"DataFrame df{i} does not contain the specified words and will be skipped.")
But this deletes the table and adds the string at the end of the document and not in the place of the removed table. Is there any way to do that?
IIUC, here is a quick fix using add_p_before
.
Replace these lines (35 to 38) :
table_index = table_parent.index(table._element)
table_parent.remove(table._element)
document.add_paragraph(str1)
By these :
from docx.oxml.text.paragraph import CT_P
from docx.text.paragraph import Paragraph
# middle of the code..
tmp = CT_P.add_p_before(table._element)
p = Paragraph(tmp, table._parent)
p.text = str1
table_parent.remove(table._element)
Output :
Used input (test.docx) :