Search code examples
xmlparsinghtml-parsingxml-parsingdocbook

What's the best parser to get a docbook from word, rtf, ecc


I need to know what's the best solution for my problem.

I want to make a Docbook editor that receive a word file (or any other rich text formats) and allows you to modify the content/style to build up a docbook.

Essentially with this question I wanna examine what's the best option to achieve this result.

Is it better to

-> upload the file
-> parse it to docbook 
-> transform it to xhtml 
-> modify with wysiwyg editor
-> save changes to docbook

or

-> upload the file
-> transform it to xhtml syntax
-> modify the xhtml with a wysiwyg editor
-> convert xhtml to docbook

please relate the solution with some tools/library/programs that can help me to do so (if possible).


Solution

  • After verification, the DocBook XSL allow you to convert (in particular) :

    • DocBook XML to Word XML
    • Word XML to DocBook XML
    • DocBook XML to XHTML

    Think you've got your general solution. XSLT can be processed with a lot of programming langages.

    As for your two processes, I understand that the difference is that in the second one, you try to only detect the modification in the XHTML to reflect them in the DocBook XML. It'll probably be easier to convert it entirely using XSL.

    I think that you should tell us what is the context of the application you're creating, and then we could know the inherent limitations and better calibrate our answers.

    Edit: You could inpire yourself or even taking it as a solution : Oxygen XML Editor. See http://www.oxygenxml.com/docbook_editor.html

    This editor can edit DocBook in WYSIWYG, and import/export it in many ways.

    There's an Author version, more simple, that shall do all of this.