Search code examples
openpyxl

With Openpyxl, how to get get image information that can't be got with sheet._images?


I have a .xlsx file that contains images in one of its worksheet 'sheet'. These images are visible in Excel. However, if I open that sheet with Openpyxl, the list sheet._images contains no element.

I'm sorry that it's not convenient to share that .xlsx file on Internet. And I've no idea about how to create a .xlsx file with such weird characteristic.
Can anyone tell me how to get image information of such a .xlsx file?

I've tried to get image information with sheet._images but failed, as sheet._images is an empty list. I expect to get information of those images in a worksheet of a .xlsx file.


Solution

  • In Excel sheets images are usually displayed as shapes of a sheet drawing. The sheet drawing is a layer hovering over the sheet cells. Shapes in this layer are anchored to the cells but are not cell content. Such images, and only such images, are read by current openpyxl and provided in list Worksheet._images.

    Since version Microsoft Office 365 images also can be the result of a cell formula using the IMAGE function. Furthermore images can be placed in cells as the cell content. See Insert Picture in-cell in Excel. Both types of images are outside the sheet drawing and thus are not read by current openpyxl.

    Office Open XML, the file format of *.xlsx files, is mainly a ZIP archive. For *.xlsx files there is a directory /xl/mediain that ZIP archive. This directory contains image files for each embedded image in that *.xlsx-workbook. This will be images in sheet drawings, images as cell content (from =IMAGE(...) as well as from in-cell-inserted pictures), images in headers/footers and so on. Thus if one unzips the *.xlsx file, one will find all embedded images in /xl/media.