Search code examples
dockerlibreofficelibreoffice-writer

How to convert pdf to docx in libreoffice 6.4?


I have libreoffice 6.4 installed in my ubuntu 18.04 container.

The goals is to convert a pdf file to docx.

I have already tried these commands :

libreoffice --headless --convert-to docx:"Microsoft Word 2007/2010/2013 XML" /pdf/pdf.pdf --outdir /pdf

libreoffice --headless --convert-to docx:"Microsoft Word 2007-2013 XML" /pdf/pdf.pdf --outdir /pdf

libreoffice --headless --convert-to docx:"MS Word 2007 XML" /pdf/pdf.pdf --outdir /pdf

libreoffice --headless --convert-to docx:writer_MS_Word_97 /pdf/pdf.pdf --outdir /pdf

libreoffice --headless --convert-to "docx:writer_MS_Word_2007" /pdf/pdf.pdf --outdir /pdf

libreoffice --headless --convert-to docx:writer_OOXML /pdf/pdf.pdf --outdir /pdf

libreoffice --headless --convert-to doc /pdf/pdf.pdf --outdir /pdf

libreoffice --headless --convert-to "docx:writer_MS_Word_2007" --outdir /pdf pdf.pdf

But they always return this message :

convert /pdf/pdf.pdf -> /pdf/pdf.docx using filter : writer_MS_Word_2007
Overwriting: /pdf/pdf.docx
Error: Please verify input parameters... (SfxBaseModel::impl_store <file:///pdf/pdf.docx> failed: 0x81a(Error Area:Io Class:Parameter Code:26))

Can anyone give me a clue on what's going on?


UPDATE :

I tried this command :

libreoffice --infilter="writer_pdf_import" --convert-to docx  --outdir /pdf /pdf/pdf.pdf

and it returned this message :

convert /pdf/pdf.pdf -> /pdf/pdf.docx using filter : Office Open XML Text
Overwriting: /pdf/pdf.docx

I can see it needs the --infilter-- parameter, since the input file is a pdf.

But, it's using Office Open XML Text filter, I need to switch it to Microsoft Word 2007-2013 XML, how can I do that?

I already tried these and not working :

libreoffice --infilter="writer_pdf_import" --convert-to docx:"Microsoft Word 2007-2013 XML"  --outdir /pdf /pdf/pdf.pdf

libreoffice --infilter="writer_pdf_import" --convert-to "docx:Microsoft Word 2007-2013 XML"  --outdir /pdf /pdf/pdf.pdf

libreoffice --infilter="writer_pdf_import" --convert-to "docx:writer_MS_Word_2007"  --outdir /pdf /pdf/pdf.pdf

libreoffice --infilter="writer_pdf_import" --convert-to docx:"writer_MS_Word_2007"  --outdir /pdf /pdf/pdf.pdf

libreoffice --infilter="writer_pdf_import" --convert-to docx:writer_MS_Word_2007  --outdir /pdf /pdf/pdf.pdf

they always return this message (same as above) :

convert /pdf/pdf.pdf -> /pdf/pdf.docx using filter : writer_MS_Word_2007
Overwriting: /pdf/pdf.docx
Error: Please verify input parameters... (SfxBaseModel::impl_store <file:///pdf/pdf.docx> failed: 0x81a(Error Area:Io Class:Parameter Code:26))

Solution

  • I finally figured out the workaround.

    Hopefully, this will be useful for anyone having the same issues.

    I did an experiment, by trying the possible word filters one by one from this list, there are 4 successful attempts.

    libreoffice --headless --infilter="writer_pdf_import" --convert-to docx  --outdir /pdf /pdf/pdf.pdf
    
    libreoffice --headless --infilter='writer_pdf_import' --convert-to docx:"MS Word 2007 XML" --outdir /pdf /pdf/pdf.pdf
    
    libreoffice --headless --infilter='writer_pdf_import' --convert-to doc:"MS Word 2007 XML" --outdir /pdf /pdf/pdf.pdf
    
    libreoffice --headless --infilter="writer_pdf_import" --convert-to doc  --outdir /pdf /pdf/pdf.pdf
    

    But between those 4 commands, the last one yields the best result, the converted document file content looks similar compared to the original one, FYI, my document has some chinese characters and tables, the first 3 commands didn't draw the table borders correctly, while the last one did.


    UPDATE :

    I decided to install libreoffice 7.0 on ubuntu 18.04 container.

    To see the detailed list of filters, go here, then open one of the xcu files, the filter details should be there, to use it, just pick from the name attribute, and use it like this :

    libreoffice --headless --infilter='writer_pdf_import' --convert-to doc:"<enter_filter_name_here>" --outdir /pdf /pdf/pdf.pdf