Search code examples
pythonrtf

Extracting images from a .RTF file with Python


does anyone know how to extract or copy images from a .rtf file ?

I have tryed to look for a solution but from what I found, all of the libraries and articles people cite no longer exist or have non-existen documentation.


Solution

  • since I didn't find a straightforward solution to extract images from a .rtf file I came up with a workaround.

    I used the win32com lib to open the file and then saved it as a .docx:

    word = win32com.client.Dispatch('Word.Application')
    doc = word.Documents.Open(RtfFilePath)
    doc.SaveAs(saveDocxPath, FileFormat=16)
    doc.Close()
    word.Quit()
    

    This is way you can use docx2txt and other libraries that extract images from word files:

    text = docx2txt.process("/path/your_word_doc.docx", '/home/example/img/')
    

    Also I have found out that some images can be saved as .wmf, these files can't be extarcted this way. I have found a workaround for this by using commands.

    subprocess.run(f"tar -x -f {FileToExtaract} -C {TargetFolder}")
    

    The extracted images will be located in your TargerFolder\word\media. You can convert them into any other image type using the Pillow library with this code:

    from PIL import Image
    
    Image.open("image.wmf").save("image.png")