I have a task where I need to put placeholders in my .docx
files and automatically replace them with information that I have. I tried having ${VARNAME} as the placeholder syntax but in the document.xml for that docx file I see $, {, VARIABLE and } broken up into 4 different character runs. On what basis WORD chooses this. Is there a way so that this does not happen?
For replacing placeholder and manipulating docx files I am using docx4j. I am extracting the w:t
nodes via XPATH. Recently I tried having placeholder syntax as only $VARNAME and this was not broken up. Can I consider it a foolproof naming convention for placeholder. If not can u suggest how can I tackle this situation. Would introducing custom tags in docx help? Any advice appreciated.
You can never assume that Word will not break up a character run. There is no guaranteed way. You either need to change your approach for extracting the information, by not relying on everything being in a single <w:t>
tag, or you need to use a different kind of "target".
Word does not support "custom tags", so that's not an option.
More reliable is to use a ContentControl (std tag). That Word Open XML looks something like this:
<w:sdt>
<w:sdtPr>
<w:alias w:val="test"/><w:tag w:val="test"/><w:id w:val="803656476"/>
<w:placeholder>
<w:docPart w:val="B4C191A9BCFE488E807F3919BC721619"/>
</w:placeholder>
<w:text/>
</w:sdtPr>
<w:sdtContent>
<w:p>
<w:r>
<w:t>Content to be changed by code.</w:t>
</w:r>
</w:p>
</w:sdtContent>
</w:sdt>
The VARNAME would be either the w:alias
or the w:tag
(your choice). These correspond to the Title and Tag properties, respectively, in the Word UI and object model. There's no way these are going to get broken up.
From there, you get the <w:t>
descendant of the <w:sdtContent>
element.
If you wish, the content control can be mapped to a Node in a Custom XML Part stored in the document. (Unlike custom tags in the text Word does support adding xml files in the document's Zip package.) In that case, it's possible for your code to address the Custom XML file, rather than the document.xml in order to read/write content. The changes will be reflected in the content controls linked to the nodes.