I'm using python 2.7 and python-docx-template
to move information from a text file into a docx template. The text is converted to RichText before being placed in the template.
Some lines of text may contain a latex command for bold somewhere in the text. I'm using re.sub() to remove the latex commands, leaving only the word(s) to be bold. That means the word is not in bold in the final docx file. Ideally, I would like to replace the latex commands with the docx commands necessary to make the word bold.
For example, 'Here is a sentence with \textbf{bold words} in the middle of it.'
I've tried replacing the latex with python-docx-template
's rt.add('bold words', bold=True)
but it does not translate to RichText when the entire paragraph is converted to RichText. I did not really expect this to work but I tried anyway.
I also tried adding xml commands, <w:r><w:rPr><w:b/></w:rPr><w:t xml:space="preserve"> bold words </w:t></w:r>
, but this did not work either.
I suspect that I will have to break the string into chunks, then rt.add() them together. If so, I'm not sure how to do this. A string may have more than one latex bold command but most strings will not have any latex commands.
If chunks are necessary, how can I go about doing this? Or, is there an alternative solution?
Edit:
I was able to answer my own question but I would be glad to know of better or more efficient ways to accomplish this task.
from docxtpl import DocxTemplate, RichText
import re
tpl=DocxTemplate('test_tpl.docx')
startsentence = 'Here is a sentence with \textbf{bold words} in the middle of it.'
latexbold = re.compile(r'\textbf\{([a-zA-Z0-9 .]+)\}')
# Strip the latex command.
strippedsentence = re.sub(latexbold, '\\1', startsentence)
rtaddsentence = re.sub(latexbold, 'rt.add(" \\1 ", bold=True)', startsentence)
docxsentence = re.sub(latexbold, '<w:r><w:rPr><w:b/></w:rPr><w:t xml:space="preserve">\\1</w:t></w:r>', startsentence)
richstrippedsentence = RichText(strippedsentence)
richrtaddsentence = RichText(rtaddsentence)
richdocxsentence = RichText(docxsentence)
context = {
'strippedresult': richstrippedsentence,
'rtresult': richrtaddsentence,
'docxresult': richdocxsentence,
}
tpl.render(context)
tpl.save('test.docx')
Here are the results in Word.
I figured out how to solve my problem with re.split
. This only works if the latex command is not the first part of the string, which should always be the case for my situation. However, this is not the most generic solution.
from docxtpl import DocxTemplate, RichText
import re
tpl=DocxTemplate('test_tpl.docx')
startsentence = 'Here is a sentence with \textbf{bold words} in the middle of it and \textbf{the end of it.}'
latexbold = re.compile(r'\textbf\{([a-zA-Z0-9 .]+)\}')
x = re.split(latexbold, startsentence)
rt = RichText("")
l = len(x)
for i in range(0,l):
if i%2 == 0:
rt.add(x[i])
else:
rt.add(x[i], bold=True)
context = {
'example': rt,
}
tpl.render(context)
tpl.save('test.docx')
This is the result: