Hi I am using JOD converter to convert documents to html.
I have tested converting doc file to html using openoffice(desktop mode) in two ways.
1st way: Using Save as option
The output file has lot of tags which are deprecated from HTML4.
2nd way: Using "Export" option
The output file is clean with corresponding CSS.
FYI, I am using the below command to covert doc to html
soffice --headless -convert-to html:"HTML (StarWriter)" inputfile.doc
In openoffice headless mode when I am trying to convert doc file it is using "Save as" instead of "Export" which is resulting lot of tags. but, I want to use "Export" instead of "Save as" using command in headless mode.
I have found there is noway converting html to html5 in openoffice headless version and user Tidy to convert output html to html5.
with the command
tidy -c -m --indent true --doctype html5 inputfile.html