i'm doing some test with docx4j. What i need to do is to convert complex Word documents (2-3 pages of text, tables, bullet list, images) into xhtml.
I took this example: https://github.com/plutext/docx4j/blob/master/src/samples/docx4j/org/docx4j/samples/ConvertOutHtml.java
and it works fine, i have just 2 concerns:
Coverting a word document took around 30 seconds. This line takes the 95% of the computation time:
wordMLPackage = Docx4J.load(new java.io.File(inputfilepath));
Final goal is to create a simple webapp that gets Word document (different each time) and provides back xhtml. A user can't wait so long. Is there anything to do to improve performances? Why does it take so long (Tika for example is hundreds of time faster)? Currently i'm running it on my laptop Eclipse IDE, fast pc anyway, do you think once running server side will be better?
Thanks a lot.
Loading the JAXB Context takes time. It is typically done once, so first load will be slow. That said, it shouldn't take 30 sec! On my aging laptop, it is around 5 sec.
You might enable logging for more insight into how much of it is JAXB Context init.