Search code examples
pythondocxlibreofficedocfile-conversion

Having trouble using Python and LibreOffice to convert pdf to docx and doc to docx


I have spent a good amount of time trying to determine what is going wrong exactly, with the code I am using to convert pdf to docx (and doc to docx) using LibreOffice.

I have used both the windows run interface to test-run some of the code I have found to be relevant, and have tried on python as well, neither of which works.

I have LibreOffice v6.0.2 installed on windows.

I have been using variations of this code to attempt to convert some pdf files to docx of which the specific pdf file is not really relevant:

    import subprocess
    lowriter='C://Program Files/LibreOffice/program/swriter.exe'
    subprocess.run('{} --invisible --convert-to docx --outdir "{}" "{}"'
                   .format(lowriter,'dir',
                                
    'filepath.pdf',),shell=True)

I have tried code, again, in both the run interface on the windows os, and through python using the above code, with no luck. I have tried without the outdir as well, just in case I was writing that incorrectly, but always get a return code of 1:

    CompletedProcess(args='C://Program Files/LibreOffice/program/swriter.exe 
    --invisible --convert-to docx --outdir "{dir}" 
    {filepath.pdf}"', returncode=1)

The dir and filepath.pdf are place holders I have put.

I have a similar problem with the doc to docx conversion.


Solution

  • There are a number of problems here. You should first get the --convert-to call to work from the command line as @CristiFati commented, and then implement in python.

    Here is the code that works on my system. No // in the path, and quotes are needed. Also, the folder is LibreOffice 5 on my system.

    import subprocess
    lowriter = 'C:/Program Files (x86)/LibreOffice 5/program/swriter.exe'
    subprocess.run(
        '"{}" --convert-to docx --outdir "{}" "{}"'
        .format(lowriter,'dir', 'filepath.doc',), shell=True)
    

    Finally, it looks like converting from PDF to DOCX is not supported. LibreOffice Draw can open a PDF file and save as ODG format.

    EDIT:

    Here is working code to convert from PDF. I upgraded to LO 6, so the version number ("LibreOffice 5") is no longer required in the path.

    import subprocess
    loffice = 'C:/Program Files/LibreOffice/program/soffice.exe'
    subprocess.run(
        '"{}" --convert-to odg --outdir "{}" "{}"'
        .format(loffice,'dir', 'filepath.pdf',), shell=True)
    

    filepath.odg