python xml powerpoint accessibility python-pptx

Check if image is decorative in powerpoint using python-pptx

The company I work at requires a list of all inaccessible images/shapes in a .pptx document (don't have alt-text and aren't decorative). To automate the process, I'm writing a script that extracts all inaccessible images/shapes in a specified .pptx and compiles a list. So far, I've managed to make it print out the name, slide #, and image blob of images with no alt-text.

Unfortunately after extensively searching the docs, I came to find that the python-pptx package does not support functionality for checking whether an image/shape is decorative or not.

I haven't mapped XML elements to objects in the past and was wondering how I could go about making a function that reads the val attribute within the adec:decorative element in this .pptx file (see line 4).

<p:cNvPr id="3" name="Picture 2">
    <a:extLst>
        <a:ext uri="{FF2B5EF4-FFF2-40B4-BE49-F238E27FC236}"><a16:creationId xmlns:a16="http://schemas.microsoft.com/office/drawing/2014/main" id="{77922398-FA3E-426B-895D-97239096AD1F}" /></a:ext>
        <a:ext uri="{C183D7F6-B498-43B3-948B-1728B52AA6E4}"><adec:decorative xmlns:adec="http://schemas.microsoft.com/office/drawing/2017/decorative" val="0" /></a:ext>
    </a:extLst>
</p:cNvPr>

Since I've only recently started using this package, I'm not sure how to go about creating custom element classes within python-pptx. If anyone has any other workaround or suggestions please let me know, thank you!

Solution

Creating a custom element class would certainly work, but I would regard it as an extreme method (think bazooka for killing mosquitos) :).

I'd be inclined to think you could accomplish what you want with an XPath query on the closest ancestor you can get to with python-pptx.

Something like this would be in the right direction:

cNvPr = shape._element._nvXxPr.cNvPr
adec_decoratives = cNvPr.xpath(".//adec:decorative")
if adec_decoratives:
    print("got one, probably need to look more closely at them")

One of the challenges is likely to be getting the adec namespace prefix registered because I don't think it is by default. So you probably need to execute this code before the XPath expression, possibly before loading the first document:

from pptx.oxml.ns import _nsmap

_nsmap["adec"] = "http://schemas.microsoft.com/office/drawing/2017/decorative"]

Also, if you research XPath a bit, I think you'll actually be able to query on <adec:decorative> elements that have val=0 or whatever specific attribute state satisfies what you're looking for.

But this is the direction I recommend. Maybe you can post your results once you've worked them out in case someone else faces the same problem later.