Search code examples
javahtmlconvertersdocxdocx4j

how to convert HTML to .docx using docx4j?


I read some articles about the conversion of html to .docx and I found out that docx4j gives pretty decent results. I wonder if anyone could provide me the following info:

  1. Needed jars and versions.
  2. Sample code for conversion from html to .docx.

Sorry I couldn't post anything I tried because I haven't tried anything on this task yet, although I use Apache POI to convert the bytes[] I get from datatabse to html to output in a rich text editor on a jsf application. Please enlighten me, I'm lost in stress and confusion...!


Solution

  • To import XHTML, use

    <dependency>
        <groupId>org.docx4j</groupId>
        <artifactId>docx4j-ImportXHTML</artifactId>
        <version>3.0.0</version>
    </dependency>
    

    See further http://www.docx4java.org/blog/2013/11/docx4j-3-0-and-maven/

    For sample code, see https://github.com/plutext/docx4j-ImportXHTML/tree/master/src/samples/java/org/docx4j/samples

    Note that your input needs to be well-formed XML, so if you have HTML, you'll need to tidy it first (with one of the many java libraries which can do this for you).