Search code examples
javaapache-poidocx

JAVA: Extract Footer Images from a docx document


I've a task of extracting all the images from a docx file. I am ussing the snippet below for the same. I am using the Apache POI api for the same.

`File file = new File(InputFileString);
 FileInputStream fs = new FileInputStream(file.getAbsolutePath());
 //FileInputStream fs=new FileInputStream(src);
  //create office word 2007+ document object to wrap the word file
  XWPFDocument doc1x=new XWPFDocument(fs);
  //get all images from the document and store them in the list piclist
  List<XWPFPictureData> piclist=doc1x.getAllPictures();
  //traverse through the list and write each image to a file
  Iterator<XWPFPictureData> iterator=piclist.iterator();
  int i=0;
  while(iterator.hasNext()){
   XWPFPictureData pic=iterator.next();
   byte[] bytepic=pic.getData();
   BufferedImage imag=ImageIO.read(new ByteArrayInputStream(bytepic));
          ImageIO.write(imag, "jpg", new File("C:/imagefromword"+i+".jpg"));
          i++;
  }`

However, this code cannot detect any images which are in the footer or header section of the document.

I've extensively used my google skills and couldn't come up with anything useful.

Is there anyway to capture the image file in the footer section of the docx file?


Solution

  • I am no expert on Apache POI issues, but a simple search came up with this code:

    package com.concretepage;
    import java.io.FileInputStream;
    import org.apache.poi.openxml4j.opc.OPCPackage;
    import org.apache.poi.xwpf.model.XWPFHeaderFooterPolicy;
    import org.apache.poi.xwpf.usermodel.XWPFDocument;
    import org.apache.poi.xwpf.usermodel.XWPFFooter;
    import org.apache.poi.xwpf.usermodel.XWPFHeader;
    public class ReadDOCXHeaderFooter {
       public static void main(String[] args) {
         try {
         FileInputStream fis = new FileInputStream("D:/docx/read-test.docx");
         XWPFDocument xdoc=new XWPFDocument(OPCPackage.open(fis));
         XWPFHeaderFooterPolicy policy = new XWPFHeaderFooterPolicy(xdoc);
         //read header
         XWPFHeader header = policy.getDefaultHeader();
         System.out.println(header.getText());
         //read footer
         XWPFFooter footer = policy.getDefaultFooter();
         System.out.println(footer.getText());
         } catch(Exception ex) {
        ex.printStackTrace();
         } 
      }
    }
    

    And the documentation page of XWPFHeaderFooter (which is the direct father class of the XWPFFooter class in the above example...) shows the same getAllPictures method you used to iterate over all the pictures in the documents body.

    I on mobile, so I haven't really tested anything - but it seems straight-forward enough to work.

    Good luck!