The company I work at requires a list of all inaccessible images/shapes in a .pptx document (don't have alt-text and aren't decorative). To automate the process, I'm writing a script that extracts all inaccessible images/shapes in a specified .pptx and compiles a list. So far, I've managed to make it print out the name, slide #, and image blob of images with no alt-text.
Unfortunately after extensively searching the docs, I came to find that the python-pptx package does not support functionality for checking whether an image/shape is decorative or not.
I haven't mapped XML elements to objects in the past and was wondering how I could go about making a function that reads the val
attribute within the adec:decorative
element in this .pptx file (see line 4).
<p:cNvPr id="3" name="Picture 2">
<a:extLst>
<a:ext uri="{FF2B5EF4-FFF2-40B4-BE49-F238E27FC236}"><a16:creationId xmlns:a16="http://schemas.microsoft.com/office/drawing/2014/main" id="{77922398-FA3E-426B-895D-97239096AD1F}" /></a:ext>
<a:ext uri="{C183D7F6-B498-43B3-948B-1728B52AA6E4}"><adec:decorative xmlns:adec="http://schemas.microsoft.com/office/drawing/2017/decorative" val="0" /></a:ext>
</a:extLst>
</p:cNvPr>
Since I've only recently started using this package, I'm not sure how to go about creating custom element classes within python-pptx. If anyone has any other workaround or suggestions please let me know, thank you!
Creating a custom element class would certainly work, but I would regard it as an extreme method (think bazooka for killing mosquitos) :).
I'd be inclined to think you could accomplish what you want with an XPath
query on the closest ancestor you can get to with python-pptx
.
Something like this would be in the right direction:
cNvPr = shape._element._nvXxPr.cNvPr
adec_decoratives = cNvPr.xpath(".//adec:decorative")
if adec_decoratives:
print("got one, probably need to look more closely at them")
One of the challenges is likely to be getting the adec
namespace prefix registered because I don't think it is by default. So you probably need to execute this code before the XPath expression, possibly before loading the first document:
from pptx.oxml.ns import _nsmap
_nsmap["adec"] = "http://schemas.microsoft.com/office/drawing/2017/decorative"]
Also, if you research XPath a bit, I think you'll actually be able to query on <adec:decorative>
elements that have val=0
or whatever specific attribute state satisfies what you're looking for.
But this is the direction I recommend. Maybe you can post your results once you've worked them out in case someone else faces the same problem later.