Search code examples
pythondirectorysubdirectorypython-os

python os.walk to certain level


I want to build a program that uses some basic code to read through a folder and tell me how many files are in the folder. Here is how I do that currently:

import os

folders = ['Y:\\path1', 'Y:\\path2', 'Y:\\path3']
for stuff in folders:
    for root, dirs, files in os.walk(stuff, topdown=True):
        print("there are", len(files), "files in", root)

This works great until there are multiple folders inside the "main" folder as it can return a long, junky list of files due to poor folder/file management. So I would like to go only to the second level at most. example:

Main Folder
---file_i_want
---file_i_want
---Sub_Folder
------file_i_want <--*
------file_i want <--*
------Sub_Folder_2
---------file_i_dont_want
---------file_i_dont_want

I know how to go to only the first level with a break and with del dirs[:] taken from this post and also this post.

import os
import pandas as pd

folders = ['Y:\\path1', 'Y:\\path2', 'Y:\\path3']
for stuff in folders:
    for root, dirs, files in os.walk(stuff, topdown=True):
        print("there are", len(files), "files in", root)
        del dirs[:] # or a break here. does the same thing.

But no matter my searching I can't find out how to go two layers deep. I may just not be understanding the other posts on it or something? I was thinking something like del dirs[:2] but to no avail. Can someone guide me or explain to mehow to accomplish this?


Solution

  • you could do like this:

    depth = 2
    
    # [1] abspath() already acts as normpath() to remove trailing os.sep
    #, and we need ensures trailing os.sep not exists to make slicing accurate. 
    # [2] abspath() also make /../ and ////, "." get resolved even though os.walk can returns it literally.
    # [3] expanduser() expands ~
    # [4] expandvars() expands $HOME
    # WARN: Don't use [3] expanduser and [4] expandvars if stuff contains arbitrary string out of your control.
    #stuff = os.path.expanduser(os.path.expandvars(stuff)) # if trusted source
    stuff = os.path.abspath(stuff)
    
    for root,dirs,files in os.walk(stuff):
        if root[len(stuff):].count(os.sep) < depth:
            for f in files:
                print(os.path.join(root,f))
    

    key is: if root[len(stuff):].count(os.sep) < depth

    It removes stuff from root, so result is relative to stuff. Just count the number of files separators.

    The depth acts like find command found in Linux, i.e. -maxdepth 0 means do nothing, -maxdepth 1 only scan files in first level, and -maxdepth 2 scan files included sub-directory.

    Of course, it still scans the full file structure, but unless it's very deep that'll work.

    Another solution would be to only use os.listdir recursively (with directory check) with a maximum recursion level, but that's a little trickier if you don't need it. Since it's not that hard, here's one implementation:

    def scanrec(root):
        rval = []
    
        def do_scan(start_dir,output,depth=0):
            for f in os.listdir(start_dir):
                ff = os.path.join(start_dir,f)
                if os.path.isdir(ff):
                    if depth<2:
                        do_scan(ff,output,depth+1)
                else:
                    output.append(ff)
    
        do_scan(root,rval,0)
        return rval
    
    print(scanrec(stuff))  # prints the list of files not below 2 deep
    

    Note: os.listdir and os.path.isfile perform 2 stat calls so not optimal. In Python 3.5, the use of os.scandir could avoid that double call.