Search code examples
pythonpython-3.xif-statementos.walkpython-zipfile

Create a zip with only .pdf and .xml files from one directory


I would love to know how i can zip only all pdfs from the main directory without including the subfolders.

I've tried several times changing the code, without any succes with what i want to achieve.

import zipfile
 
fantasy_zip = zipfile.ZipFile('/home/rob/Desktop/projects/zenjobv2/archivetest.zip', 'w')
 
for folder, subfolders, files in os.walk('/home/rob/Desktop/projects/zenjobv2/'):
 
    for file in files:
        if file.endswith('.pdf'):
            fantasy_zip.write(os.path.join(folder, file), os.path.relpath(os.path.join(folder,file), '/home/rob/Desktop/projects/zenjobv2/'), compress_type = zipfile.ZIP_DEFLATED)
        elif file.endswith('.xml'):
            fantasy_zip.write(os.path.join(folder, file), os.path.relpath(os.path.join(folder,file), '/home/rob/Desktop/projects/zenjobv2/'), compress_type = zipfile.ZIP_DEFLATED)
fantasy_zip.close()

I expect that a zip is created only with the .pdfs and .xml files from the zenjobv2 folder/directory without including any other folders/subfolders.


Solution

  • You are looping through the entire directory tree with os.walk(). It sounds like you want to just look at the files in a given directory. For that, consider os.scandir(), which returns an iterator of all files and subdirectories in a given directory. You will just have to filter out elements that are directories:

    root = "/home/rob/Desktop/projects/zenjobv2"
    for entry in os.scandir(root):
        if entry.is_dir():
            continue  # Just in case there are strangely-named directories
        if entry.path.endswith(".pdf") or entry.path.endswith(".xml"):
            # Process the file at entry.path as you see fit