I'm having a hard time trying to access/save images using the python-pptx library. So, if the image is of shape type PICTURE
(that's shape.shape_type == MSO_SHAPE_TYPE.PICTURE
) I can access/save the image easily using the 'blob' attribute. Here is the code:
import argparse
import os
from PIL import Image
import pptx
from pptx.enum.shapes import MSO_SHAPE_TYPE
from pptx import Presentation
from mdutils.mdutils import MdUtils
from mdutils import Html
def main():
parser = argparse.ArgumentParser()
parser.add_argument('ppt_name', type=str, help='add the name of the PowerPoint file(NOTE: the folder must be in the same directory as the prorgram file')
args = parser.parse_args()
pptx_name = args.ppt_name
pptx_name_formatted = pptx_name.split('.')[0]
prs = Presentation(pptx_name)
path = '{}_converted'.format(pptx_name_formatted)
if not os.path.exists(path):
os.mkdir(path)
images_folder = '{}_images'.format(pptx_name_formatted)
images_path = os.path.join(path, images_folder)
if not os.path.exists(images_path):
os.mkdir(images_path)
ppt_dict = {} #Keys: slide numbers, values: slide content
texts = []
slide_count = 0
picture_count = 0
for slide in prs.slides:
texts = []
slide_count += 1
for shape in slide.shapes:
if shape.has_text_frame:
if '\n' in shape.text:
splitted = shape.text.split('\n')
for word in splitted:
if word != '':
texts.append(word)
elif shape.text == '':
continue
else:
texts.append(shape.text)
elif shape.shape_type == MSO_SHAPE_TYPE.PICTURE:
with open('{}/image{}_slide{}.png'.format(images_path, picture_count, slide_count), 'wb') as f:
f.write(shape.image.blob)
picture_count += 1
ppt_dict[slide_count] = texts
ppt_content = ''
for k,v in ppt_dict.items():
ppt_content = ppt_content + ' - Slide number {}\n'.format(k)
for a in v:
ppt_content = ppt_content + '\t - {}\n'.format(a)
mdFile = MdUtils(file_name='{}/{}'.format(path,path)) #second argument isn't path, it just shares the path name.
mdFile.write(ppt_content)
mdFile.create_md_file()
if __name__ == "__main__":
main()
The problem is when the picture is of shape type 'auto shape' , I tried a lot of approaches but to no avail. When I do run the following code for a shape that I know is a picture:
if shape.shape_type == MSO_SHAPE_TYPE.AUTO_SHAPE:
print(shape.auto_shape_type)
print(shape.fill.type)
#indented because it's in a for loop
It outputs RECTANGLE
for shape.auto_shape_type
and PICTURE
for shape.fill.type
But what I want now is to save the picture (maybe by writing the the binary image bytestream of the image). Can someone help?
The "link" to the image (part, having the blob) is in the fill definition. Using that you can get to the image.
Print out the XML for the surroundings of the fill definition with shape.fill._xPr.xml
. That will give you a look at what you need to navigate to. Good chance it will look something like "rId9"
with some particular other number where the "9" placeholder is in that example. Probably in the neighborhood of something like "blipfill"
. The image is used as the "fill" of the shape, so that's what's going on here.
Then get the slide part with something like slide._part
and use its .related_parts
"dict" to look up the image "fill" part using the relationship-id (the string like "rId9").
image_part = slide._part.related_parts["rId9"]
The ImagePart
implementation is here:
https://github.com/scanny/python-pptx/blob/master/pptx/parts/image.py#L21
and it gives access to the image and a lot of details about it as well.
You'll have to retrieve the "rId9"-like string using lxml
calls, something roughly like:
rIds = shape.fill._xPr.xpath(".//@embed")
rId = rIds[0]
You'll need to do a little research on XPath to work out the right expression, based on the XML you print out in the earlier step. There's a lot out there on XPath, including here on SO, this is one resource to get started: http://www.rpbourret.com/xml/XPathIn5.htm
If you can't work it out, post the XML you printed out and we can get you to the next step.