Search code examples
pythonpandasdocxpython-docx

Replacing a table with text in .docx using Python


I have a .docx file with tables and I am trying to replace each of them with some string. I am using the following code:

import pandas as pd
import re
from docx import Document

docx_file_path = "C:/Users/llama/test.docx"
document = Document(docx_file_path)

tables = document.tables

if not tables:
    print("No tables found in the document.")
else:
    for i, table in enumerate(tables, start=1):
        df = pd.DataFrame()
        for row_index, row in enumerate(table.rows):
            row_data = [cell.text.strip() for cell in row.cells]
            df = pd.concat([df, pd.DataFrame([row_data])], ignore_index=True)
        
        contains_specified_words = any(
            any(word in cell for cell in df.values.flatten())
            for word in ['TEST33']
        )

        if contains_specified_words:
            duplicate_columns_indices = df[df[0] == df[1]].index

            df.loc[duplicate_columns_indices, 0] = ""

            str1 = ""
            for row_index, row in df.iterrows():
                str1 += "\t".join(row.astype(str)) + "\n"

            table_parent = table._element.getparent()

            table_index = table_parent.index(table._element)
            table_parent.remove(table._element)

            document.add_paragraph(str1)

            modified_docx_path = f"C:/Users/llama/test2.docx"
            document.save(modified_docx_path)

            print(f"Modified document saved as {modified_docx_path}\n")
        else:
            print(f"DataFrame df{i} does not contain the specified words and will be skipped.")

But this deletes the table and adds the string at the end of the document and not in the place of the removed table. Is there any way to do that?


Solution

  • IIUC, here is a quick fix using add_p_before.

    Replace these lines (35 to 38) :

                table_index = table_parent.index(table._element)
                table_parent.remove(table._element)
    
                document.add_paragraph(str1)
    

    By these :

    from docx.oxml.text.paragraph import CT_P
    from docx.text.paragraph import Paragraph
    
    # middle of the code..
    
                tmp = CT_P.add_p_before(table._element)
                p = Paragraph(tmp, table._parent)
                p.text = str1
    
                table_parent.remove(table._element)
    

    Output :

    Used input (test.docx) :