Search code examples
pythonpython-3.xscriptingos.walk

os.walk best approach for making dirs and extracting .zip file into the newly made dirs


Suppose I have a root directory with n number of project folders inside it. Each project folder contains one .zip file with 2 files in it: dpd.zip and lib.ear.

I want to write a script that if I put it inside the root folder, the script will iterate over all of the projects directories and inside each of the project will create a new directory named "core" with 2 sub directories named "zip" and "ear" and then, the script will extract .zip file in the projects directory, move the dpd.zip and lib.ear into each relevant sub folder that was created, and finally extract the contents and delete itself after the job was done.

What I have tried so far:

import os
import shutil
import zipfile

def extract_zip(file_path, extract_to):
    with zipfile.ZipFile(file_path, 'r') as zip_ref:
        zip_ref.extractall(extract_to)
    
for root, dirs, files in os.walk("."):
    for subdir in dirs:
        if not os.path.isdir(subdir + "core"):
            core_dir = subdir + "/core"
            ear_dir = core_dir + "/ear"
            zip_dir = core_dir + "/zip"
        
            os.mkdir(core_dir)
            os.mkdir(ear_dir)
            os.mkdir(zip_dir)
        
            # so far the tree was made inside each project`s folder
        
            for fl in os.listdir(subdir):
                file_path = os.path.join(subdir, fl)
                
                if fl.endswith(".zip") and os.path.isfile(file_path):
                    extract_zip(file_path, core_dir)
                    # so far extract the core .zip into the core folder of each project`s
                
                    # here comes the obstacle
        else:
            continue

The obstacle is when I try to add logic and keep the extraction of the .zip and .ear into its relative subdirs. The only solution I came so far was opening another for loop specific for the core subdir`s, and add logic for the files in it (.zip/.ear/other files), extract and remove the extracted source. But, because the root folder contains a huge number of projects, this solution is not efficient and will last forever to complete.

Do you think there is a walk-around here?

Thanks in advance for your support.

Update:

The final tree should look like:

root/
    Project1/
        core/
            zip/
                # extracted files from dpd.zip
            ear/
                # extracted files from lib.ear
        # extracted files from core without .zip .ear extension files
    .
    .
    .

    ProjectN/
        core/
            zip/
                # extracted files from dpd.zip
            ear/
                # extracted files from lib.ear
            # extracted files from core without .zip .ear extension files

Solution

  • I didn't really use your whole script, but here is my initial attempt. It works with me so you can check if it works for you.

    import os
    import shutil
    import zipfile
    
    def extract_zip(file_path, extract_to):
        with zipfile.ZipFile(file_path, 'r') as zip_ref:
            zip_ref.extractall(extract_to)
    
    for filename in os.listdir("."):
        f = os.path.join(".", filename)
        if os.path.isfile(f):
            continue
    
        core_dir = os.path.join(f, "core")
        ear_dir = os.path.join(core_dir, "ear")
        zip_dir = os.path.join(core_dir, "zip")
    
        os.mkdir(core_dir)
        os.mkdir(ear_dir)
        os.mkdir(zip_dir)
    
        original_zip_file = ""
    
        # list all files in project folder
        for zip_file in os.listdir(f):
            potential_zip_file = os.path.join(f, zip_file)
    
            # check if file is indeed file and is zip file
            if os.path.isfile(potential_zip_file) and potential_zip_file.endswith(".zip"):
                # extract it in project/core folder
                extract_zip(potential_zip_file, core_dir)
                original_zip_file = potential_zip_file
    
    
        for name in os.listdir(core_dir):
            curr_file = os.path.join(core_dir, name)
            if curr_file.endswith(".zip"):
                extract_zip(curr_file, zip_dir)
                os.remove(curr_file)
            elif curr_file.endswith(".ear"):
                extract_zip(curr_file, ear_dir)
                os.remove(curr_file)
            else:
                continue
    
    

    This is initial state:

    root/
      proj1/
        zipped.zip
          ├── dpd.zip
          │   └── ...something
          └── lib.ear
      proj2/
        zipped.zip
          ├── dpd.zip
          │   └── ...something
          └── lib.ear
      script.py
    

    This is after running script:

    root/
      proj1/
        core/
          zip/
            ...something
          ear/
            lib.ear
      proj2/
        core/
          zip/
            ...something
          ear/
            lib.ear
      script.py