Search code examples
pythonpdfconvertersdoc

I need to convert .doc and .docx files to .pdf using python


I need to convert .doc and .docx files to .pdf using python . I have seen some answers already available but that are using comtypes and opening WordApplication. I can not do that. What I seek is a way of doing it using some python libraries that preserves font , tables , heading size and images etc , without opening MS Word or LibreOffice or anything like that Converting .doc and .docx files to some intermediate format(and later converting that format to pdf) would be fine too , if needed . Please help me with the code or guided instructions(I am not a pro in python) I should follow.


Solution

  • I have been in the similar problem earlier,

    My suggestion:

    sorry there is no such direct python library to handle Microsoft office formats specially (.doc)

    So try to use LibreOffice as a service in Ubuntu its "libreoffice" if windows its "soffice.exe" use this in command line to convert the document to .PDF without opening LibreOffice

    and its easy and fast too and more over gives almost perfect conversion of the file.

    A sample:

    For Windows:

        C:\Program Files (x86)\LibreOffice 4\program\soffice.exe" --headless --convert-to pdf "input_file_path" --outdir "output_dir_path"
    

    This will convert the input file into pdf in the given output directory without opening the LibreOffice ans just using it as a service.

    To run this command from python you can use "subprocess" like libraries.