Search code examples
javams-officedocx4j

Changing Default text of a Plain Text Content Control of a existing .docx file


I am given a .docx template for which I need to populate in my java application. Initially, I am planning to use Apache POI, since before this, I was tasked to fill up a .xlsx template and it worked well. But, based on my research, doc4j is more suitable for my case.

My case is that this .docx template uses Plain Text Content Control like this:

enter image description here

Now, upon inspection to its XML structure, I see the <w:sdt> directly under <w:p> directly under the <w:body> tag.

<w:body>
    ...
    <w:p w:rsidR="00ED05E8" w:rsidRPr="00DA4BE7" w:rsidRDefault="00AC5B37" w:rsidP="00BA6F7F">
        ...
        <w:sdt>
            <w:sdtPr>
                <w:rPr>
                    <w:rFonts w:ascii="Arial" w:hAnsi="Arial" w:cs="Arial"/>
                    <w:i/>
                    <w:sz w:val="24"/>
                    <w:szCs w:val="24"/>
                    <w:u w:val="single"/>
                </w:rPr>
                <w:alias w:val="Name of Office/Agency Name"/>
                <w:tag w:val="Name of Office/Agency Name"/>
                <w:id w:val="-781645881"/>
                <w:placeholder>
                    <w:docPart w:val="DefaultPlaceholder_-1854013440"/>
                </w:placeholder>
                <w:text/>
            </w:sdtPr>
            <w:sdtEndPr/>
            <w:sdtContent>
                <w:r w:rsidR="00340180" w:rsidRPr="00616BA5">
                    <w:rPr>
                        <w:rFonts w:ascii="Arial" w:hAnsi="Arial" w:cs="Arial"/>
                        <w:i/>
                        <w:sz w:val="24"/>
                        <w:szCs w:val="24"/>
                        <w:u w:val="single"/>
                    </w:rPr>
                    <w:t>(Name of Office/Agency Name)</w:t>
                </w:r>
            </w:sdtContent>
        </w:sdt>
        ...
</w:body>

I want to change the text on that <w:t> of that <w:sdt> from "(Name of Agency)" into a different String. The problem is that I do not know how and is stucked on after these lines:

WordprocessingMLPackage document = WordprocessingMLPackage.load(new java.io.File(...));
MainDocumentPart mainDocument = document.getMainDocumentPart();

I have this w:id of -781645881, but I don't know what to do with this information. Is this even the itemId referred on this ContentControlsXmlEdit sample class from the docx4j site?

I cannot fetch that <w:sdt> node even after using the following code:

String itemId = "-781645881".toLowerCase();
CustomXmlDataStoragePart customXmlDataStoragePart = (CustomXmlDataStoragePart)wordMLPackage.getCustomXmlDataStorageParts().get(itemId);
CustomXmlDataStorage customXmlDataStorage = customXmlDataStoragePart.getData();

What should I do to be able to change the value of the plain text content control?


Solution

  • This answer is something I devised out of desperation for the following reasons:

    1. I quite not mastered yet accessing things in the Word's .xml programmatically using .
    2. I extracted the exact .xml file of the .docx file I'm processing.
    3. There are no storeItemid found on my .xml of the .docx file.

    Here is my utility class written in .groovy:

    import javax.xml.bind.JAXBElement
    import org.apache.poi.openxml4j.exceptions.InvalidFormatException
    import org.docx4j.openpackaging.packages.WordprocessingMLPackage
    import org.docx4j.openpackaging.parts.WordprocessingML.MainDocumentPart
    import org.docx4j.wml.CTBookmark
    import org.docx4j.wml.P
    import org.docx4j.wml.R
    import org.docx4j.wml.SdtBlock
    import org.docx4j.wml.SdtContent
    import org.docx4j.wml.SdtRun
    import org.docx4j.wml.Text
    
    class WordReport {
        private WordprocessingMLPackage document
        private Map<String, String> contentControlMapping
        private Map<String, Object> reportArgs
    
        public WordReport(Map<String, Object> reportArgs) {
            document = WordprocessingMLPackage.createPackage()
            this.reportArgs = reportArgs
        }
    
        public WordprocessingMLPackage exportReport() {
            return document
        }
    
        private String getNewMapping(String contentControlText)  {
            return contentControlMapping.get(contentControlText)
        }
    
        private boolean isMapped(String contentControlText) {
            return contentControlMapping.containsKey(contentControlText)
        }
    
        protected void mapNewMapping() {
            MainDocumentPart mainDocument = document.getMainDocumentPart()
            List<Object> nodes = mainDocument.getJAXBNodesViaXPath("//w:sdt", false)
    
            String key
            SdtContent content
            nodes.each { n ->
                if(n instanceof SdtBlock) {
                    content = n.getSdtContent()
                }
                else if(n instanceof JAXBElement) {
                    if(n.getValue() instanceof SdtRun) {
                        content = n.getValue().getSdtContent()
                    }
                }
    
                content.getContent().each { sdtcc ->
                    if(sdtcc instanceof P) {
                        sdtcc.getContent().each { pc ->
                            pc.getContent().each { rc ->
                                println "rc.getValue().getClass(): " + rc.getValue().getClass()
                                if(rc.getValue() instanceof Text) {
                                    key = rc.getValue().getValue()
                                    isMapped(key) ? rc.getValue().setValue(getNewMapping(key)) : null
                                }
                                else if(rc.getValue() instanceof R) {
                                    rc.getValue().getContent().each { rrc ->
                                        if(rrc instanceof JAXBElement) {
                                            key = rrc.getValue().getValue()
                                            isMapped(key) ? rrc.getValue().setValue(getNewMapping(key)) : null
                                        }
                                    }
                                }
                            }
                        }
                    }
                    else if(sdtcc instanceof R) {
                        sdtcc.getContent().each { rc ->
                            if(rc instanceof JAXBElement) {
                                key = rc.getValue().getValue()
                                isMapped(key) ? rc.getValue().setValue(getNewMapping(key)) : null
                            }
                        }
                    }
                    else if(sdtcc instanceof JAXBElement) {
                        if(sdtcc.getValue() instanceof CTBookmark) {
    
                        }
                        else if(sdtcc.getValue() instanceof JAXBElement) {
                            key = sdtcc.getValue().getValue()
                            isMapped(key) ? sdtcc.getValue().setValue(getNewMapping(key)) : null
                        }
                    }
                }
            }
        }
    
        public void setMapping(Map contentControlMapping) {
            this.contentControlMapping = contentControlMapping
        }
    }
    

    The core part of this class is the mapNewMapping() method. What basically it does is it maps the mapping on the contentControlMapping variable into any <w:t> inside <w:sdt>s, regardless whether it is directly under a <w:sdt> or if it's inside of <w:rPr>, etc. I retrieve the list of all <w:sdt> using the getJAXBNodesViaXPath() method.

    The limitation of this is this can only support limited set of combinations of P, R, CTBookmark, SdtBlock, SdtContent, SdtRun. If the <w:t> is found inside complex or deep nested .xml that I have not anticipated, it will not be mapped. That is why I included mentioning that I have read first the .xml of the .docx file.