Search code examples
pythonwordpressrestwordpress-rest-api

How to Publish Docx file with images to WordPress site?


So I am able to post Docx files to WordPress using WP REST-API using mammoth docx package in Python I am able to upload an image to WordPress.

But when there are images in the docx file they are not uploading on the WordPress media section.

Any input on this?

I am using python for this. Here is the code for Docx to HTML conversion

        with open(file_path, "rb") as docx_file:
            # html = mammoth.extract_raw_text(docx_file)
            result = mammoth.convert_to_html(docx_file, convert_image=mammoth.images.img_element(convert_image))
            html = result.value  # The generated HTML

kindly do note that I am able to see images in the actual published post but they have a weird source image URL & are not appearing in the WordPress media section. Weird image source URL like data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAEBAQEBAQEBAQEBAQECAgMCAgICAgQDAwIDBQQFBQUEBAQFBgcGBQUHBgQEBgkGBwgICAgIBQYJCgkICgcICAj/2wBDAQEBAQICAgQCAgQIBQQFCAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAj/wAARCAUABQADASIAAhEBAxEB/8QAHwAAAQMFAQEBAAAAAAAAAAAAAAUGBwMECAkKAgsB/8QAhxAAAQIEBAMEBQYHCAUOFggXAQIDAAQFEQYHEiETMUEIIlFhCRQ & so on

Also Huge thanks to Contributors for the Python to WordPress repo


Solution

  • The mammoth cli has a function that extracts images, saves them to a directory and inserts the file names in the img tags in the html code. If you don't want to use mammoth in command line you could use this code:

    import os
    from mammoth.cli import ImageWriter, _write_output
    
    output_dir = './output'
    filename = 'filename.docx'
    
    with open(filename, "rb") as docx_fileobj:
        convert_image = mammoth.images.img_element(ImageWriter(output_dir))
        output_filename = "{0}.html".format(os.path.basename(filename).rpartition(".")[0])
        output_path = os.path.join(output_dir, output_filename)
        
        result = mammoth.convert(
            docx_fileobj,
            convert_image=convert_image,
            output_format='html',
        )
        _write_output(output_path, result.value)
    

    Note that you would still need to change the img links as you'll be uploading the images to Wordpress, but this solves your mapping issue. You might also want to change the ImageWriter class to save the images to something else than tiff.