I have a lot of pptx files to search in a directory and I am looking for specific word "data" in these files. I created the below code which reads all the files but it does not provide the correct result of true or false. For example in Person1.pptx
the word "data" exists in two "shapes". The question is where is exactly the mistake and why the code have incorrect results.
from pptx import Presentation
import os
files = [x for x in os.listdir("C:/Users/../Desktop/Test") if x.endswith(".pptx")]
for eachfile in files:
prs = Presentation("C:/Users/.../Desktop/Test/" + eachfile)
print(eachfile)
print("----------------------")
for slide in prs.slides:
for shape in slide.shapes:
print ("Exist? " + str(hasattr(shape, 'data')))
The result is as below
Person1.pptx
----------------------
Exist? False
Exist? False
Exist? False
Exist? False
Exist? False
Exist? False
Exist? False
Exist? False
Person2.pptx
----------------------
Exist? False
Exist? False
Exist? False
Exist? False
Exist? False
Exist? False
Exist? False
Exist? False
Exist? False
Exist? False
Exist? False
And the expected result would be to find in one of the slides the word "data" and print true. Actually the expected result would be:
Person1.pptx
----------------------
Exist? True
Person1.pptx
----------------------
Exist? False
True if in any of the shapes in each slide the word exists and false if in all shapes of the slide the word does not exist.
I found it by myself. :)
from pptx import Presentation
import os
files = [x for x in os.listdir("C:/Users/.../Desktop/Test") if x.endswith(".pptx")]
for eachfile in files:
prs = Presentation("C:/Users/.../Desktop/Test/" + eachfile)
for slide in prs.slides:
for shape in slide.shapes:
if hasattr(shape, "text"):
shape.text = shape.text.lower()
if "whatever_you_are_looking_for" in shape.text:
print(eachfile)
print("----------------------")
break