I am trying to merge 2 docx documents and have succeeded in achieving most of my use case. I am able to merge the text and the tables successfully but in case of images in the docx file, it shows the placeholder but not the image itself. Here is my code snippet for reference:
def document
Integer i
Integer j
void mergeDocx(FileInputStream test1, FileInputStream test2, FileOutputStream dest) {
i = 0
j = 0
XWPFDocument doc1 = new XWPFDocument(test1)
XWPFDocument doc2 = new XWPFDocument(test2)
document = new XWPFDocument()
parseElement(doc1)
parseElement(doc2)
parseStyle(doc1, doc2)
OutputStream out = dest
document.write(out)
out.close()
}
This is the base version of parseElement(XWPFDocument doc)
I started with
void parseElement(XWPFDocument doc) {
for (IBodyElement e : doc.getBodyElements()) {
if (e instanceof XWPFParagraph) {
XWPFParagraph p = (XWPFParagraph) e
if (p.runs.embeddedPictures.flatten()) {
p.runs.each { r ->
r.embeddedPictures.each { pic ->
document.addPictureData(pic.pictureData.data, pic.pictureData.pictureType)
}
}
} else {
if (p.getCTP().getPPr() != null && p.getCTP().getPPr().getSectPr() != null) {
continue
} else {
document.createParagraph()
document.setParagraph(p, i)
i++
}
}
} else if (e instanceof XWPFTable) {
XWPFTable t = (XWPFTable) e
document.createTable()
document.setTable(j, t)
j++
}
}
}
This is the alternate version of parseElement(XWPFDocument doc)
I used
void parseElement(XWPFDocument doc) {
for (IBodyElement e : doc.getBodyElements()) {
if (e instanceof XWPFParagraph) {
XWPFParagraph p = (XWPFParagraph) e
if (p.runs.embeddedPictures.flatten()) {
p.runs.each { r ->
r.embeddedPictures.each { pic ->
XWPFParagraph title = document.createParagraph()
XWPFRun run = title.createRun()
run.setText("Fig.1 A Natural Scene")
run.setBold(true)
title.setAlignment(ParagraphAlignment.CENTER)
run.addBreak()
run.addPicture(new ByteArrayInputStream(pic.pictureData.data), XWPFDocument.PICTURE_TYPE_JPEG, pic.pictureData.fileName, Units.toEMU(200), Units.toEMU(200))
}
}
} else {
if (p.getCTP().getPPr() != null && p.getCTP().getPPr().getSectPr() != null) {
continue
} else {
document.createParagraph()
document.setParagraph(p, i)
i++
}
}
} else if (e instanceof XWPFTable) {
XWPFTable t = (XWPFTable) e
document.createTable()
document.setTable(j, t)
j++
}
}
}
The problem here is that whenever an Image is encountered it considers it as an instance of Paragraph and then it tries to do setParagraph()
which I know I should not use here for Images.
Here is how my word docx looks like after merge
I am using ApachePOI for this but I am open to a solution using docx4j as well. Any guidance would be appreciated.
P.S: The programming language is groovy.
Updating my parseElement()
method to this worked for me:
void parseElement(XWPFDocument doc) {
for (IBodyElement e : doc.getBodyElements()) {
if (e instanceof XWPFParagraph) {
XWPFParagraph p = (XWPFParagraph) e
if (p.runs.embeddedPictures.flatten()) {
p.runs.each { r ->
r.embeddedPictures.each { pic ->
XWPFParagraph p1 = document.createParagraph()
XWPFRun r1 = p1.createRun()
int width = pic.getCTPicture().getSpPr().getXfrm().getExt().getCx() as int
int height = pic.getCTPicture().getSpPr().getXfrm().getExt().getCy() as int
int imgFormat1 = getImageFormat(pic.pictureData.fileName)
r1.addPicture(new ByteArrayInputStream(pic.pictureData.data), imgFormat1, pic.pictureData.fileName, width, height)
i++
}
}
} else {
if (p.getCTP().getPPr() != null && p.getCTP().getPPr().getSectPr() != null) {
continue
} else {
document.createParagraph()
document.setParagraph(p, i)
i++
}
}
} else if (e instanceof XWPFTable) {
XWPFTable t = (XWPFTable) e
document.createTable()
document.setTable(j, t)
j++
}
}
}
One thing that I was missing was the i++
whenever I encountered an image within a paragraph.