Good morning to all (scanny over there?)
I have a piece of code, cause I want to find previous key words in a Microsoft Word Document, and then replace them with some others
The code works fine outside tables, but when inside tables nothing works in the way should be:
here is the code:
from os import listdir
from docx import Document
nuevo_codigo = input('Teclee nuevo codigo: ')
nuevo_servicio = input('Teclee nuevo servicio: ')
nuevo_cobjeto = input('Teclee nuevo codigo del objeto: ')
nuevo_objeto = input('Teclee nuevo objeto: ')
nuevo_cliente = input('Teclee nuevo cliente: ')
path_reporte = "D:/Escritorio/WORD PYTHON"
lista_documentos = []
lista_path = []
# itero para obtener lo que hay en el path
for documento in listdir(path_reporte):
# obtengo el nombre del documento del path
lista_documentos.append(documento)
# concateno el string para obtener el path total
lista_path.append(path_reporte + '/' + documento)
print(lista_path, lista_documentos)
for i in lista_path:
document = Document(i)
dic = {'PYTHON-CODIGO': nuevo_codigo,
'PYTHON-SERVICIO': nuevo_servicio,
'PYTHON-COBJETO': nuevo_cobjeto,
'PYTHON-OBJETO': nuevo_objeto,
'PYTHON-CLIENTE': nuevo_cliente,
}
# outside tables word *.docx everything is peachy
for p in document.paragraphs:
inline = p.runs
for j in range(len(inline)):
text = inline[j].text
if text in dic.keys():
text = text.replace(text, dic[text])
inline[j].text = text
# inside tables word *.docx
for tabla in document.tables:
for columna in tabla.columns:
for celda in columna.cells:
for p in celda.paragraphs:
inline = p.runs
for j in range(len(inline)):
text = inline[j].text
if text in dic.keys():
text = text.replace(text, dic[text])
inline[j].text = text
document.save(i)
here is one of the document previously configured:
and after I run the code, that is what happend
document after replace has been done
How should I configure the table information?
What is missing in my code?
Run boundaries are arbitrary. In particular, there is no guarantee that each word has its own run. If you add a print([run.text for run in inline])
statement you'll see what the actual run contents are.
The only reliable method here is to work at the paragraph level, perhaps something like:
paragraph.text = paragraph.text.replace(key, other_word)
The unfortunate side-effect of this is that all character formatting is lost. If you search on "python-docx search replace" you should find more about what it takes to get around this by splitting and joining existing runs to isolate a particular word in its own run for replacement. It's not a trivial algorithm.