does anyone know how to extract or copy images from a .rtf file ?
I have tryed to look for a solution but from what I found, all of the libraries and articles people cite no longer exist or have non-existen documentation.
since I didn't find a straightforward solution to extract images from a .rtf file I came up with a workaround.
I used the win32com lib to open the file and then saved it as a .docx:
word = win32com.client.Dispatch('Word.Application')
doc = word.Documents.Open(RtfFilePath)
doc.SaveAs(saveDocxPath, FileFormat=16)
doc.Close()
word.Quit()
This is way you can use docx2txt and other libraries that extract images from word files:
text = docx2txt.process("/path/your_word_doc.docx", '/home/example/img/')
Also I have found out that some images can be saved as .wmf, these files can't be extarcted this way. I have found a workaround for this by using commands.
subprocess.run(f"tar -x -f {FileToExtaract} -C {TargetFolder}")
The extracted images will be located in your TargerFolder\word\media. You can convert them into any other image type using the Pillow library with this code:
from PIL import Image
Image.open("image.wmf").save("image.png")