Search code examples
javaapache-poidocxxwpf

Apache Poi XWPF - How do we split a docx into two sections?


I have an existing document (in bytes) that I parsed into XWPFDocument using

InputStream is = new ByteArrayInputStream(docuByte);
XWPFDocument docx = new XWPFDocument(OPCPackage.open(is));

This document has at least 5 pages. I am planning to set blank footers on first two pages (title and TOC page), and a page footer from third page and up.

In order to do this, I understand that I need to separate the document into two different sections.

section 1 - first and second page
section 2 - third page and up

However, I could not find a method that would enable me to split the document into two sections. Would anyone know how to implement this?


Solution

  • There is no special method to add section breaks in XWPFDocument up to now. So one needs using the underlying org.openxmlformats.schemas.wordprocessingml.x2006.main.* classes.

    A section break in Office Open XML Word documents (*.docx) is a paragraph having section properties setting in paragraph properties. So the need is to insert such a paragraph into the document. To insert a paragraph XWPFDocument provides a method insertNewParagraph(org.apache.xmlbeans.XmlCursor cursor). But to get this cursor position, one needs to know where the paragraph shall be inserted. This can be a already present paragraph containing a certain text for example.

    The inserted section properties are then relevant for the section above that paragraph.

    The document body also has section properties which are relevant for the last section.

    The following code shows that. It searches for a paragraph containing a certain text. Then it inserts a paragraph having section properties, which are a copy of the former last section properties, before that found paragraph. Then it removes all header/footer settings from the new inserted section properties. After that the section above the new inserted paragraph has no header/footer settings while former header/footer settings remains for the last section.

    import java.io.*;
    
    import org.apache.poi.xwpf.usermodel.*;
    
    public class WordInsertSectionbreak {
        
     static org.openxmlformats.schemas.wordprocessingml.x2006.main.CTSectPr getDocumentBodySectPr(XWPFDocument document) {
      org.openxmlformats.schemas.wordprocessingml.x2006.main.CTDocument1 ctDocument = document.getDocument();
      org.openxmlformats.schemas.wordprocessingml.x2006.main.CTBody ctBody = ctDocument.getBody();
      org.openxmlformats.schemas.wordprocessingml.x2006.main.CTSectPr ctSectPrDocumentBody = ctBody.getSectPr();
      return ctSectPrDocumentBody;   
     }
        
     static org.openxmlformats.schemas.wordprocessingml.x2006.main.CTSectPr getNextSectPr(XWPFParagraph paragraph) {
      // get the section settings of next section in document
      org.openxmlformats.schemas.wordprocessingml.x2006.main.CTSectPr ctSectPrNextSect = null;
      // maybe next section settings are in a paragraph
      XWPFDocument document = paragraph.getDocument(); 
      int pos = document.getPosOfParagraph(paragraph);
      for (int p = pos; p < document.getParagraphs().size(); p++) {
       paragraph = document.getParagraphArray(p);
       if (paragraph.getCTP().getPPr() != null) {
        ctSectPrNextSect = paragraph.getCTP().getPPr().getSectPr();   
       }
       if (ctSectPrNextSect != null) break;
      }
      // if not in a paragraph next section settings are in documetn body
      if (ctSectPrNextSect == null) { 
       ctSectPrNextSect = getDocumentBodySectPr(document);
      }
      return ctSectPrNextSect;   
     }
         
     static XWPFParagraph insertSectionbreak(XWPFDocument document, org.apache.xmlbeans.XmlCursor cursor) {
      XWPFParagraph paragraph = null;;
      // insert a paragraph for section settings for new section above and section break.
      paragraph = document.insertNewParagraph(cursor);
      // get next section properties, which were section properties for previous section above
      org.openxmlformats.schemas.wordprocessingml.x2006.main.CTSectPr ctSectPrNextSect = getNextSectPr(paragraph);
      // set a copy of section properties for previous section above as section properties for new section
      if (ctSectPrNextSect != null) {
       org.openxmlformats.schemas.wordprocessingml.x2006.main.CTSectPr ctSectPrNewSect = (org.openxmlformats.schemas.wordprocessingml.x2006.main.CTSectPr)ctSectPrNextSect.copy();
       paragraph.getCTP().addNewPPr().setSectPr(ctSectPrNewSect);  
       return paragraph;  
      }
      return null;
     } 
    
     static XWPFParagraph getParagraphByText(XWPFDocument document, String text) {
      for (XWPFParagraph paragraph : document.getParagraphs()) {
       String paragraphText = paragraph.getText();
       if (paragraphText.contains(text)) {
        return paragraph;  
       }
      }
      return null;  
     }
     
     static void removeHeadersAndFooters(XWPFParagraph sectionBreakParagraph) {
      if (sectionBreakParagraph == null) return;
      if (sectionBreakParagraph.getCTP().getPPr() != null) {
       org.openxmlformats.schemas.wordprocessingml.x2006.main.CTSectPr ctSectPr = sectionBreakParagraph.getCTP().getPPr().getSectPr();
       // remove headers and footers from section
       for (int i = ctSectPr.getHeaderReferenceArray().length-1; i >= 0; i--) {
        org.openxmlformats.schemas.wordprocessingml.x2006.main.CTHdrFtrRef ctHdrFtrRef = ctSectPr.getHeaderReferenceArray(i);
        ctSectPr.removeHeaderReference(i);
       }
       for (int i = ctSectPr.getFooterReferenceArray().length-1; i >= 0; i--) {
        org.openxmlformats.schemas.wordprocessingml.x2006.main.CTHdrFtrRef ctHdrFtrRef = ctSectPr.getFooterReferenceArray(i);
        ctSectPr.removeFooterReference(i);
       }
      }
     }
     
     public static void main(String[] args) throws Exception {
    
      XWPFDocument document = new XWPFDocument(new FileInputStream("./WordDocument.docx"));
      
      XWPFParagraph paragraph = getParagraphByText(document, "Some text to mark where section break shall be inserted");
      if (paragraph != null) {
       XWPFParagraph sectionBreakParagraph = insertSectionbreak(document, paragraph.getCTP().newCursor());
       if (sectionBreakParagraph != null) {
        removeHeadersAndFooters(sectionBreakParagraph);
       }  
      }  
      
      FileOutputStream out = new FileOutputStream("./WordDocumentResult.docx");
      document.write(out);
      out.close();
      document.close();
     }
    }
    

    Code is tested and works using current apache poi 5.2.2.