Search code examples
pythonlinuxlibreofficeunoconv

How to use unoconv with a newer version of libreoffice


I am trying to convert encrypted documents (doc/docx) into PDF using python.

What I do is:

  • first decrypt them temporarily in a separate folder
  • use the unoconv command line to convert the decrypted file into pdf:

unoconv -f pdf -eSelectPdfVersaion=1 [path-to-file]

The conversion runs, but I notice that in the doc and docx files there is a change in the appearance of the documents (both the decrypted file and the pdf) which does not affect the original encrypted file (I tested it by simply decrypting the file from a windows client and the decrypted file as it originally was).

The appearance is basically a change in the document style which affects the amount of pages. For example a 13-pages Word document is decrypted into 14-pages of Word document and converted to a PDF file of 14 pages. Similarly a 348-page doc file gets converted into a 330-pages doc file and then a 330-pages PDF file.

I discovered that there is a slight incompatibility of styles between Microsoft Word and the version of LibreOffice installed with Unoconv (4.3). Doing my tests I noticed that fonts get changed to LibreOffice compatible ones that are slightly different in size than the original ones.

I installed a later version of LibreOffice (5.1, 5.3) and in my tests the decrypted doc/docx file had the proper formatting and page numbers, but the unoconv does not utilize the newer version and sticks to 4.3, thus producing the PDF file with incorrect styling and pages number.

I tried to use the:

soffice --headless --convert-to pdf [path-to-file] --outdir [path-to-export-directory]

But it does nothing.

  1. Is there a way to utilize unoconv with a LibreOffice version other than the 4.3?

  2. Is there a way to make the --convert-to command to work with LibreOffice 5.1 or even 5.3?


Solution

  • Here are few steps you could try: Uninstall the older version of libreoffice using

    sudo apt remove libreoffice*
    

    Install the latest version of libreoffice using

    sudo add-apt-repository ppa:libreoffice/ppa
    sudo apt-get update
    sudo apt-get install libreoffice
    

    To check if libreoffice is installed successfully type

    libreoffice --version
    

    This should return the version number

    Next install Microsoft fonts using

    sudo apt install ttf-mscorefonts-installer
    

    Also install any other font dependencies that you anticipate your documents could come with

    Finally use the below command to convert to pdf. Make sure no libreoffice application is running in the background

    libreoffice --headless --invisible --convert-to pdf "test.docx" --outdir files
    

    You should find the pdf in the folder called files

    This works on ubuntu 18.04.5 LTS.