Search code examples
ms-wordcross-platformmarkupdocx

Conversion between docx / doc / rtf and lightweight markup


I am looking for a tool or set of tools to convert between file formats D and M where

  • D is a format handled by MSWord, in order of preference, docx, doc, rtf
  • M is a lightweight markup, such as markdown, textile, txt2tags, it can be an esoteric one
  • there is a way to generate html from M
  • conversion is two-way, it's done both from D to M, and from M to D
  • utf-8 encoding is handled properly
  • the content is simple, paragraphs, some simple formatting like bold and italics, maybe lists
  • the tools are platform-independent

What I've found so far

  • TeX, LaTeX -- too heavyweight
  • docx2txt -- too lightweight, it supports no formatting at all
  • html -- MSWord produces bloated html
  • a few one-way conversions, like doc to mediawiki,

UPDATE:

The use case is a document workflow between technical and non-technical people

  • I, the technical guy edit a document in plain text, put it into version control, etc.
  • I send it to my manager or other non-technical people
  • They add comments, make changes to it using their Word, then they send it back to me
  • I want to simply grok their changes, make my changes, put it into version control, without having to use Word

Solution

  • Adam, I've used docx4j to convert docx to html, edit the html in CKEditor, and then use docx4j to convert the html back to docx. My process made some assumptions about the css (ie it was designed to handle docx4j's clean html, and editing in CKEditor).

    You don't say whether there is a way to generate M from HTML?