Search code examples
pythontextpowerpointpython-pptx

How to extract text from a text shape within a Group Shape in powerpoint, using python-pptx.


My PowerPoint slide has a number of group shapes in which there are child text shapes.

Earlier I was using this code, but it doesn't handle Group shapes.

for eachfile in files:
prs = Presentation(eachfile)

textrun=[]
for slide in prs.slides:
    for shape in slide.shapes:
        if hasattr(shape, "text"):
            print(shape.text)
            textrun.append(shape.text)
new_list=" ".join(textrun)
text_list.append(new_list)

I am trying to extract the text from these child text boxes. I have managed to reach these child elements using GroupShape.shape But I get an error, that these are of type 'property', so I am not able to access the text or iterate (TypeError: 'property' object is not iterable) over them.

from pptx.shapes.group import GroupShape
from pptx import Presentation
for eachfile in files:
prs = Presentation(eachfile)

textrun=[]
for slide in prs.slides:
    for shape in slide.shapes:
        for text in GroupShape.shapes:
            print(text)

I would then like to catch the text and append to a string for further processing.

So my question is, how to access the child text elements and extract the text from them.

I have spent a lot of time going though the documentation and source code, but haven't been able to figure it out. Any help would be appreciated.


Solution

  • I think you need something like this:

    from pptx.enum.shapes import MSO_SHAPE_TYPE
    
    for slide in prs.slides:
        # ---only operate on group shapes---
        group_shapes = [
            shp for shp in slide.shapes
            if shp.shape_type == MSO_SHAPE_TYPE.GROUP
        ]
        for group_shape in group_shapes:
            for shape in group_shape.shapes:
                if shape.has_text_frame:
                    print(shape.text)
    

    A group shape contains other shapes, accessible on its .shapes property. It does not itself have a .text property. So you need to iterate the shapes in the group and get the text from each of those.

    Note that this solution only goes one level deep. A recursive approach could be used to walk the tree depth-first and get text from groups containing groups if there were any.

    Also note that not all shapes have text, so you must check the .has_text_frame property to avoid raising an exception on, say, a picture shape.