Search code examples
pythonpdfdocxlibreoffice

LibreOffice convert .docx to .pdf in parallel not working well


I have a lot of docx files to be converted to pdf. Converting them one by one takes long time. So I write a python scripts to convert them in parallel:

from subprocess import Popen
import time
import os

os.chdir(os.path.dirname(__file__))

output_dir = './outputs'
source_file_format = './docs/example_{}.docx'

po_list = [Popen(
    f"/Applications/LibreOffice.app/Contents/MacOS/soffice --invisible --convert-to pdf --outdir {output_dir} {source_file_format.format(i)}",
    shell=True)
    for i in range(0, 7, 1)]

while po_list:
    time.sleep(0.01)
    for i, p in enumerate(po_list):
        status = p.poll()
        if status is None:
            continue
        elif status == 0:
            print('Succeed: [{}] {} -> {}'.format(p.returncode, p.stderr, p.args))
            po_list.remove(p)
        else:
            print('Failed: {} : {}'.format(p.args, p.poll()))
            po_list.remove(p)

But each time I run this script, only a part of docx files are converted successfully. The rest conversion processes even not throw any error info.


Solution

  • We were also stuck on the same issue for some time.

    Multiple Instances of LibreOffice shares the same space using a UserInstallation directory and thus parallel conversion was creating a problem here (The intermittent processes seem to get mixed up).

    Using a different directory for each instance of libre helped to solve this issue. You may achieve this via UserInstallation env variable which can be passed as: "-env:UserInstallation=file:///d:/tmp/p0/"

    You may automate this by appending your loop variable or any unique identifier in the directory.

    Reference: https://ask.libreoffice.org/en/question/42975/how-can-i-run-multiple-instances-of-sofficebin-at-a-time/