Search code examples
javadocx4j

DOCX4J: Converting HTML to Docx - Table Formatting


I am converting DocX to Html and back to DocX. The final Docx is successfully generated. However, the conversion skewed the formatting of the table in the final document. The table generated in the final docx has its cell width lengthened, causing the table to go out of boundary of the document.

  • The original table in docx has column width 8.15 cm (Table width, 16.30cm).
  • Converted to html the table has width: 6.42in.
  • Converted back to docx the table column width is 10.76 cm (Table width, 21.52cm).

Is there a way for me to keep the same format after conversion? Any advice is greatly appreciated.

Below is my code:

    private void convertHtmlToDocx() throws IOException, JAXBException, Docx4JException{
        //convert back to docx 

        String inputfilepath = System.getProperty("user.dir") + "myPath";
        String baseURL = "file:///"+System.getProperty("user.dir")+"path";

        String stringFromFile = FileUtils.readFileToString(new File(inputfilepath), "UTF-8");

        String unescaped = stringFromFile;
        if (stringFromFile.contains("</") ) {
            unescaped = StringEscapeUtils.unescapeHtml(stringFromFile);         
        }      

        System.out.println("Unescaped: " + unescaped);

        // Setup font mapping
        RFonts rfonts = Context.getWmlObjectFactory().createRFonts();
        rfonts.setAscii("Century Gothic");
        XHTMLImporterImpl.addFontMapping("Century Gothic", rfonts);

        // Create an empty docx package
        WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.createPackage();

        NumberingDefinitionsPart ndp = new NumberingDefinitionsPart();
        wordMLPackage.getMainDocumentPart().addTargetPart(ndp);
        ndp.unmarshalDefaultNumbering();        

        // Convert the XHTML, and add it into the empty docx we made
        XHTMLImporterImpl XHTMLImporter = new XHTMLImporterImpl(wordMLPackage);
        XHTMLImporter.setTableFormatting(FormattingOption.IGNORE_CLASS);
        XHTMLImporter.setParagraphFormatting(FormattingOption.IGNORE_CLASS);
        XHTMLImporter.setHyperlinkStyle("Hyperlink");
        wordMLPackage.getMainDocumentPart().getContent().addAll(XHTMLImporter.convert(unescaped, baseURL) );


        wordMLPackage.save(new java.io.File(System.getProperty("user.dir") + "myPath") );

    }

Solution

  • Is your use case web-based editing via XHTML roundtrip?

    If so, maybe docx-html-editor helps. It works by saving state/hints which are used in the round trip process.

    Aside from this, tables in Word are either fixed cell widths, or not. Is the behaviour you describe occuring with a fixed width table, or not?

    Fixed width should be ok (or easy enough to make so). Not fixed is harder...