I am given a .docx template for which I need to populate in my java application. Initially, I am planning to use Apache POI, since before this, I was tasked to fill up a .xlsx template and it worked well. But, based on my research, doc4j is more suitable for my case.
My case is that this .docx template uses Plain Text Content Control like this:
Now, upon inspection to its XML structure, I see the <w:sdt>
directly under <w:p>
directly under the <w:body>
tag.
<w:body>
...
<w:p w:rsidR="00ED05E8" w:rsidRPr="00DA4BE7" w:rsidRDefault="00AC5B37" w:rsidP="00BA6F7F">
...
<w:sdt>
<w:sdtPr>
<w:rPr>
<w:rFonts w:ascii="Arial" w:hAnsi="Arial" w:cs="Arial"/>
<w:i/>
<w:sz w:val="24"/>
<w:szCs w:val="24"/>
<w:u w:val="single"/>
</w:rPr>
<w:alias w:val="Name of Office/Agency Name"/>
<w:tag w:val="Name of Office/Agency Name"/>
<w:id w:val="-781645881"/>
<w:placeholder>
<w:docPart w:val="DefaultPlaceholder_-1854013440"/>
</w:placeholder>
<w:text/>
</w:sdtPr>
<w:sdtEndPr/>
<w:sdtContent>
<w:r w:rsidR="00340180" w:rsidRPr="00616BA5">
<w:rPr>
<w:rFonts w:ascii="Arial" w:hAnsi="Arial" w:cs="Arial"/>
<w:i/>
<w:sz w:val="24"/>
<w:szCs w:val="24"/>
<w:u w:val="single"/>
</w:rPr>
<w:t>(Name of Office/Agency Name)</w:t>
</w:r>
</w:sdtContent>
</w:sdt>
...
</w:body>
I want to change the text on that <w:t>
of that <w:sdt>
from "(Name of Agency)" into a different String. The problem is that I do not know how and is stucked on after these lines:
WordprocessingMLPackage document = WordprocessingMLPackage.load(new java.io.File(...));
MainDocumentPart mainDocument = document.getMainDocumentPart();
I have this w:id
of -781645881
, but I don't know what to do with this information. Is this even the itemId
referred on this ContentControlsXmlEdit
sample class from the docx4j site?
I cannot fetch that <w:sdt>
node even after using the following code:
String itemId = "-781645881".toLowerCase();
CustomXmlDataStoragePart customXmlDataStoragePart = (CustomXmlDataStoragePart)wordMLPackage.getCustomXmlDataStorageParts().get(itemId);
CustomXmlDataStorage customXmlDataStorage = customXmlDataStoragePart.getData();
What should I do to be able to change the value of the plain text content control?
This answer is something I devised out of desperation for the following reasons:
.xml
programmatically using docx4j..xml
file of the .docx
file I'm processing.storeItemid
found on my .xml
of the .docx
file.Here is my utility class written in .groovy
:
import javax.xml.bind.JAXBElement
import org.apache.poi.openxml4j.exceptions.InvalidFormatException
import org.docx4j.openpackaging.packages.WordprocessingMLPackage
import org.docx4j.openpackaging.parts.WordprocessingML.MainDocumentPart
import org.docx4j.wml.CTBookmark
import org.docx4j.wml.P
import org.docx4j.wml.R
import org.docx4j.wml.SdtBlock
import org.docx4j.wml.SdtContent
import org.docx4j.wml.SdtRun
import org.docx4j.wml.Text
class WordReport {
private WordprocessingMLPackage document
private Map<String, String> contentControlMapping
private Map<String, Object> reportArgs
public WordReport(Map<String, Object> reportArgs) {
document = WordprocessingMLPackage.createPackage()
this.reportArgs = reportArgs
}
public WordprocessingMLPackage exportReport() {
return document
}
private String getNewMapping(String contentControlText) {
return contentControlMapping.get(contentControlText)
}
private boolean isMapped(String contentControlText) {
return contentControlMapping.containsKey(contentControlText)
}
protected void mapNewMapping() {
MainDocumentPart mainDocument = document.getMainDocumentPart()
List<Object> nodes = mainDocument.getJAXBNodesViaXPath("//w:sdt", false)
String key
SdtContent content
nodes.each { n ->
if(n instanceof SdtBlock) {
content = n.getSdtContent()
}
else if(n instanceof JAXBElement) {
if(n.getValue() instanceof SdtRun) {
content = n.getValue().getSdtContent()
}
}
content.getContent().each { sdtcc ->
if(sdtcc instanceof P) {
sdtcc.getContent().each { pc ->
pc.getContent().each { rc ->
println "rc.getValue().getClass(): " + rc.getValue().getClass()
if(rc.getValue() instanceof Text) {
key = rc.getValue().getValue()
isMapped(key) ? rc.getValue().setValue(getNewMapping(key)) : null
}
else if(rc.getValue() instanceof R) {
rc.getValue().getContent().each { rrc ->
if(rrc instanceof JAXBElement) {
key = rrc.getValue().getValue()
isMapped(key) ? rrc.getValue().setValue(getNewMapping(key)) : null
}
}
}
}
}
}
else if(sdtcc instanceof R) {
sdtcc.getContent().each { rc ->
if(rc instanceof JAXBElement) {
key = rc.getValue().getValue()
isMapped(key) ? rc.getValue().setValue(getNewMapping(key)) : null
}
}
}
else if(sdtcc instanceof JAXBElement) {
if(sdtcc.getValue() instanceof CTBookmark) {
}
else if(sdtcc.getValue() instanceof JAXBElement) {
key = sdtcc.getValue().getValue()
isMapped(key) ? sdtcc.getValue().setValue(getNewMapping(key)) : null
}
}
}
}
}
public void setMapping(Map contentControlMapping) {
this.contentControlMapping = contentControlMapping
}
}
The core part of this class is the mapNewMapping()
method. What basically it does is it maps the mapping on the contentControlMapping
variable into any <w:t>
inside <w:sdt>
s, regardless whether it is directly under a <w:sdt>
or if it's inside of <w:rPr>
, etc. I retrieve the list of all <w:sdt>
using the getJAXBNodesViaXPath()
method.
The limitation of this is this can only support limited set of combinations of P
, R
, CTBookmark
, SdtBlock
, SdtContent
, SdtRun
. If the <w:t>
is found inside complex or deep nested .xml
that I have not anticipated, it will not be mapped. That is why I included mentioning that I have read first the .xml
of the .docx
file.