I am looking to perform text replacements in a shape's text. I am using code similar to snippet below:
# define key/value
SRKeys, SRVals = ['x','y','z'], [1,2,3]
# define text
text = shape.text
# iterate through values and perform subs
for i in range(len(SRKeys)):
# replace text
text = text.replace(SRKeys[i], str(SRVals[i]))
# write text subs to comment box
shape.text = text
However, if the initial shape.text
has formatted characters (bolded for example), the formatting is removed on the read. Is there a solution for this?
The only thing I could think of is to iterate over the characters and check for formatting, then add these formats before writing to shape.text
.
@usr2564301 is on the right track. Character formatting (aka. "font") is specified at the run level. This is what a run is; a "run" (sequence) of characters all sharing the same character formatting.
When you assign to shape.text
you replace all the runs that used to be there with a single new run having default formatting. If you want to preserve formatting you need to preserve whatever runs are not directly involved in the text replacement.
This is not a trivial problem because there is no guarantee runs break on word boundaries. Try printing out the runs for a few paragraphs and I think you'll see what I mean.
In rough pseudocode, I think this is the approach you would need to take:
This preserves any runs that do not involve the search string and preserves the formatting of the "matched" word in the "replaced" word.
This requires a few operations that are not directly supported by the current API. For those you'd need to use lower-level lxml
calls to directly manipulate the XML, although you could get hold of all the existing elements you need from python-pptx
objects without ever having to parse in the XML yourself.