Search code examples
pythondocx

Replace string in paragraph while keeping style docx library


I am replacing the strings in tables and paragraphs of word document. However the styles change. How can I keep original style format?

with open(r"C:\Users\y.Israfilbayov\Desktop\testfiles\test_namedranges\VariableNames.json") as p:
                data = json.load(p)

document = Document(r"C:\Users\y.Israfilbayov\Desktop\testfiles\test_namedranges_update\F10352-JB117-FMXXX Pile XXXX As-built Memo GAIA Auto trial_v6.docx")

for key, value in data.items():
    for paragraph in document.paragraphs:
        if key in paragraph.text:
            paragraph.text = paragraph.text.replace(str(key), str(value))
for key, value in data.items():
    for table in document.tables:
        for row in table.rows:
            for cell in row.cells:
                for paragraph in cell.paragraphs:
                    if key in paragraph.text:
                        paragraph.text = paragraph.text.replace(str(key),str(value))

There was a similar post, however it did not help me (maybe I did something wrong).


Solution

  • This should meet your needs. Requires docx2python 2.0.0+

    from docx2python.utilities import replace_docx_text
    
    replace_docx_text(
        input_filename,
        output_filename,
        ("Apples", "Bananas"),  # replace Apples with Bananas
        ("Pears", "Apples"),  # replace Pears with Apples
        ("Bananas", "Pears"),  # replace Bananas with Pears
        html=True,
    )
    

    You may have a problem if your replacement strings include tabs or symbols, but "regular" text replacement will work and preserve most[1] formatting.

    To allow this, docx2python will not replace text strings where formatting changes, e.g., "part of this string is bold", unless you specify html=False, in which case strings will be replaced regardless of format, and some formatting will be lost.

    [1] The following will be preserved:

    • italic
    • bold
    • underline
    • strike
    • superscript
    • subscript
    • small caps
    • all caps
    • highlighted
    • font size
    • colored text
    • (some others, but not guaranteed)

    Edit for follow-up question, how do I replace marker text in tables?

    My workflow for doing this is to keep all formatting in Word. That is, I create a template in Word, slice out the context I need, then put everything back together like a puzzle.

    This github "project" is an example (one file) of how I replace text in tables (where the tables can be any size).

    https://github.com/ShayHill/replace_docx_tables