Search code examples
python-3.xdocxpython-docx

python docx replace words within tables NOT WORKING


Good morning to all (scanny over there?)

I have a piece of code, cause I want to find previous key words in a Microsoft Word Document, and then replace them with some others

The code works fine outside tables, but when inside tables nothing works in the way should be:

here is the code:

from os import listdir
from docx import Document

nuevo_codigo = input('Teclee nuevo codigo: ')
nuevo_servicio = input('Teclee nuevo servicio: ')
nuevo_cobjeto = input('Teclee nuevo codigo del objeto: ')
nuevo_objeto = input('Teclee nuevo objeto: ')
nuevo_cliente = input('Teclee nuevo cliente: ')

path_reporte = "D:/Escritorio/WORD PYTHON"

lista_documentos = []
lista_path = []

# itero para obtener lo que hay en el path
for documento in listdir(path_reporte):

    # obtengo el nombre del documento del path
    lista_documentos.append(documento)
    # concateno el string para obtener el path total
    lista_path.append(path_reporte + '/' + documento)

print(lista_path, lista_documentos)

for i in lista_path:

    document = Document(i)

    dic = {'PYTHON-CODIGO': nuevo_codigo,
           'PYTHON-SERVICIO': nuevo_servicio,
           'PYTHON-COBJETO': nuevo_cobjeto,
           'PYTHON-OBJETO': nuevo_objeto,
           'PYTHON-CLIENTE': nuevo_cliente,
           }

# outside tables word *.docx everything is peachy

    for p in document.paragraphs:

        inline = p.runs

        for j in range(len(inline)):

            text = inline[j].text

            if text in dic.keys():

                text = text.replace(text, dic[text])
                inline[j].text = text

    


# inside tables word *.docx

    for tabla in document.tables:

        for columna in tabla.columns:

            for celda in columna.cells:

                for p in celda.paragraphs:

                    inline = p.runs
                    
                    for j in range(len(inline)):

                        text = inline[j].text

                        if text in dic.keys():

                            text = text.replace(text, dic[text])

                            inline[j].text = text

document.save(i)

here is one of the document previously configured:

word document configured

and after I run the code, that is what happend

document after replace has been done

How should I configure the table information?

What is missing in my code?


Solution

  • Run boundaries are arbitrary. In particular, there is no guarantee that each word has its own run. If you add a print([run.text for run in inline]) statement you'll see what the actual run contents are.

    The only reliable method here is to work at the paragraph level, perhaps something like:

    paragraph.text = paragraph.text.replace(key, other_word)
    

    The unfortunate side-effect of this is that all character formatting is lost. If you search on "python-docx search replace" you should find more about what it takes to get around this by splitting and joining existing runs to isolate a particular word in its own run for replacement. It's not a trivial algorithm.