So I am able to post Docx files to WordPress using WP REST-API using mammoth docx package in Python I am able to upload an image to WordPress.
But when there are images in the docx file they are not uploading on the WordPress media section.
Any input on this?
I am using python for this. Here is the code for Docx to HTML conversion
with open(file_path, "rb") as docx_file:
# html = mammoth.extract_raw_text(docx_file)
result = mammoth.convert_to_html(docx_file, convert_image=mammoth.images.img_element(convert_image))
html = result.value # The generated HTML
kindly do note that I am able to see images in the actual published post but they have a weird source image URL & are not appearing in the WordPress media section. Weird image source URL like  & so on
Also Huge thanks to Contributors for the Python to WordPress repo
The mammoth
cli has a function that extracts images, saves them to a directory and inserts the file names in the img
tags in the html
code. If you don't want to use mammoth
in command line you could use this code:
import os
from mammoth.cli import ImageWriter, _write_output
output_dir = './output'
filename = 'filename.docx'
with open(filename, "rb") as docx_fileobj:
convert_image = mammoth.images.img_element(ImageWriter(output_dir))
output_filename = "{0}.html".format(os.path.basename(filename).rpartition(".")[0])
output_path = os.path.join(output_dir, output_filename)
result = mammoth.convert(
_write_output(output_path, result.value)
Note that you would still need to change the img
links as you'll be uploading the images to Wordpress, but this solves your mapping issue. You might also want to change the ImageWriter
class to save the images to something else than tiff