Search code examples
pythonpython-3.xperformancedirectory

Remove multiple subdirectories with the same name in Python 3.X


Currently I'm doing some experiments in python and have some jupyter notebooks for evaluation. I'm running each experiment multiple times and with different parameters so my folder structure looks something like this:

root
  |-- .ipynb_checkpoints
  |-- idea 1
  |      |-- .ipynb_checkpoints
  |      |-- run 1
  |      |      |-- .ipynb_checkpoints
  |      |      |-- results & evaluation
  |      |-- run 2
  |      |      |-- .ipynb_checkpoints
  |      |      ...
  |      ...
  |-- idea 2
  |      |-- .ipynb_checkpoints
  |      ...
  ...

When i archive the experiments i want to get rid of the .ipynb_checkpoints folders as they are not necessary anymore in my opinion. For this i wrote a quick and dirty little script.

import re
from os import walk
from shutil import rmtree

r = re.compile('(^.*\.ipynb_checkpoints$)')
dirs = []

for dirpath, _, _ in os.walk('.', topdown=True):
    if r.match(dirpath):
        dirs.append(dirpath)

for d in dirs:
    shutil.rmtree(d)

Basically, i create a regular expression matching the desired foldername, walk through all subfolders, store the path in dir and then loop a second time to delete all .ipynb_checkpoints folders. The script works fine. However, I'm not satisfied with the code, especially the two for-loops.

Since I'm still learning how to programm, i ask myself now if there is a more pythonic way to do this. Any comment on how to make the code faster (although the script currently takes less than a second), pythonic or look cleaner is appreciated. Thanks for your help!


Solution

  • In[2]: from glob import glob
    In[3]: list(glob('**/.ipynb_checkpoints', recursive=True))
    Out[3]: 
    ['root_dir/.ipynb_checkpoints',
     'root_dir/idea_2/.ipynb_checkpoints',
     'root_dir/idea_1/.ipynb_checkpoints',
     'root_dir/idea_1/run_1/.ipynb_checkpoints',
     'root_dir/idea_1/run_2/.ipynb_checkpoints']
    

    In[4]: from pathlib import Path
    In[5]: list(Path().rglob('**/.ipynb_checkpoints'))
    Out[5]: 
    [PosixPath('root_dir/.ipynb_checkpoints'),
     PosixPath('root_dir/idea_2/.ipynb_checkpoints'),
     PosixPath('root_dir/idea_1/.ipynb_checkpoints'),
     PosixPath('root_dir/idea_1/run_1/.ipynb_checkpoints'),
     PosixPath('root_dir/idea_1/run_2/.ipynb_checkpoints')]