Search code examples

Extracting images from a .RTF file with Python

does anyone know how to extract or copy images from a .rtf file ?

I have tryed to look for a solution but from what I found, all of the libraries and articles people cite no longer exist or have non-existen documentation.


  • since I didn't find a straightforward solution to extract images from a .rtf file I came up with a workaround.

    I used the win32com lib to open the file and then saved it as a .docx:

    word = win32com.client.Dispatch('Word.Application')
    doc = word.Documents.Open(RtfFilePath)
    doc.SaveAs(saveDocxPath, FileFormat=16)

    This is way you can use docx2txt and other libraries that extract images from word files:

    text = docx2txt.process("/path/your_word_doc.docx", '/home/example/img/')

    Also I have found out that some images can be saved as .wmf, these files can't be extarcted this way. I have found a workaround for this by using commands."tar -x -f {FileToExtaract} -C {TargetFolder}")

    The extracted images will be located in your TargerFolder\word\media. You can convert them into any other image type using the Pillow library with this code:

    from PIL import Image"image.wmf").save("image.png")