I am trying to perform a straightforward conversion of docx document to pdf without applying any changes to its content. I am using 'export-FO' approach, as 'Microsoft Graph' and 'documents4j' approaches do not meet the requirements. My document contains a numbered list that causes a production of an artifact in a resulting pdf document. This artifact is always seen as overlaying the first number in a list with the last+1 number of the same list.
What causes this kind of behavior? What can I do to fix it?
Here is the link to the representative image of this artifact
This is the sample code I use to convert documents:
public class Main {
public static void main(String[] args) throws Exception {
InputStream templateInputStream = new FileInputStream("document.docx");
WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.load(templateInputStream);
Mapper fontMapper = new BestMatchingMapper();
wordMLPackage.setFontMapper(fontMapper);
OutputStream os = new FileOutputStream("document.pdf");
Docx4J.toPDF(wordMLPackage, os);
}
}
a list of dependencies I have in the sample project:
<dependency>
<groupId>org.docx4j</groupId>
<artifactId>docx4j-core</artifactId>
<version>11.5.2</version>
</dependency>
<dependency>
<groupId>org.docx4j</groupId>
<artifactId>docx4j-export-fo</artifactId>
<version>11.5.2</version>
</dependency>
<dependency>
<groupId>org.docx4j</groupId>
<artifactId>docx4j-JAXB-ReferenceImpl</artifactId>
<version>11.5.2</version>
</dependency>
<dependency>
<groupId>org.apache.xmlgraphics</groupId>
<artifactId>fop</artifactId>
<version>2.10</version>
</dependency>
and a source docx document - google drive link here
This seems to be caused by feature PP_COMMON_CONTAINERIZATION.
It is grouping the list items in a content control, then seems to be incorrectly numbering the content control as well.
You need to turn that off, but Docx4J.toPDF doesn't give you that option.
You can use instead:
FOSettings foSettings =Docx4J.createFOSettings();
foSettings.setOpcPackage(wordMLPackage);
foSettings.getFeatures().remove(ConversionFeatures.PP_COMMON_CONTAINERIZATION);
Docx4J.toFO(foSettings, os, Docx4J.FLAG_EXPORT_PREFER_XSL);
Or
FOSettings foSettings =Docx4J.createFOSettings();
foSettings.setOpcPackage(wordMLPackage);
Docx4J.toFO(foSettings, os, Docx4J.FLAG_EXPORT_PREFER_NONXSL); // NONXSL ignores content controls
Now tracking at https://github.com/plutext/docx4j/issues/607