I have a tool that find/replaces for a bulk amount of documents. Here is the code for this:
# Find the index of the find_text (case-insensitive)
index = paragraph.text.lower().find(find.entryWord.lower())
# Calculate the end of the substring to be replaced
end_index = index + len(find.entryWord)
# Create the modified paragraph
modified_paragraph = (
paragraph.text[:index]
+ paragraph.text[index:end_index]
.lower()
.replace(find.entryWord.lower(), replace.entryWord)
+ paragraph.text[end_index:]
)
paragraph.text = modified_paragraph
This works well, it is just when I have an image in the paragraph that I'm working on, that it deletes the image. This happens when I set paragraph.text = modified_paragraph
.
To my question, is there any better way to handle images that are in the same paragraph as text that I'm looking to change?
Note: I know of runs within python-docx, but they are extremely inconsistent with how words get broken up so I would prefer to avoid using those if I can.
There are no easy answers to this problem, they all involve working at and below the run level. But a few things that would be helpful to understand:
w:drawing
element and those occur as a child of run.Drawing
objects, which are the proxy for a w:drawing
element.Paragraph.text
, all the existing run elements are removed from the paragraph. They are not, however, automatically deleted. They just wait around for garbage-collection once the last reference to them goes out of scope.So a potential approach is to:
This will entail working at the XML level for certain steps, so things like paragraph._p.append(run._r)
if you're familiar with that sort of thing from other python-docx
questions and answers.