Search code examples
pythonpowerpoint

searching for keywords in a directory of powerpoints


I am trying to create a python program to search powerpoint slides for a keyword. This is what I have so far but I keep getting an error telling me that it is looking for a zip file "zipfile.BadZipFile: File is not a zip file" Thank you

from pptx import Presentation
import os

def main():

    while(True):
        search = input("Keyword: ")
        result = []
        for filename in os.listdir():
            f = open(filename)
            pres = Presentation(f)
            for slide in pres.slides:
                for shape in slide.shapes:
                    if not shape.has_text_frame:
                        continue
                    for paragraph in shape.text_frame.paragraphs:
                        for run in paragraph.runs:
                            if search in run.text:
                                result.append(run.text)
                                result.append(" - ")
                                result.append(filename)
                            else:
                                continue
            f.close()
            print(result)
if __name__ == '__main__':
    main()

Solution

  • You do not need to open the file before passing it to Presentation(). Simply pass the filename.

    prs = Presentation(filename)
    

    Also make sure that all the files you use that way are in fact PPTX files, perhaps with a few lines that look like:

    for filename in os.listdir():
        if not filename.endswith('.pptx'):
            continue
        prs = Presentation(filename)
    

    If you did want to use open files for some reason, you need to open them in binary mode:

    f = open(filename, 'rb')