Search code examples
pythondictionaryxlsxsubdirectoryos.walk

How to iterate through folder and get certain files from the subfolders grouped?


I have a folder that contains several subfolders, each containing 3-4 files that I need. I am trying to iterate through that folder and put all the files from each subfolder in a dictionary that is later dumped in a json file.

So far I've managed to do this for a single file and the json file looks like this:

enter image description here

and this is the code:

import os
import json
myDir = "\\\iads011n\\ContinuousTesting\\DailyTesting\\REPORTS"
filelist = []
for path, subdirs, files in os.walk(myDir):
    for file in files:
        if (file.endswith('.xlsx') or file.endswith('.xls') or file.endswith('.XLS')) and "Release" in file and "Integrated" not in file:
            filelist.append(os.path.join(file))

myDict = dict(zip(range(len(filelist)), filelist))

result=[]
for k,v in myDict.items():
    result.append({'id' : k, 'name' : v})

with open('XLList.json', 'w') as json_file:
    json.dump(result, json_file)

But what I'm trying to achieve is this:

enter image description here

This is the folder: enter image description here

And one of the subfolder contents looks like this: enter image description here

So basically what I need is all the xls/ xlsx files under the same subfolder grouped. The major problem is that not all the subfolders contain the same items, some may have only one xlsx file or another may have only 3 or 4, etc.


Solution

  • The problem is, that you do not "store" to which folder each file belongs. A solution would be the following:

    result = []
    for i, (path, subdirs, files) in enumerate(os.walk(myDir)): #use enumerate to track folder id
        subdir = {"id": i}
        j = 0 #file counter in subfolder
        for file in files:
            if (file.endswith('.xlsx') or file.endswith('.xls') or file.endswith('.XLS')) and "Release" in file and "Integrated" not in file:
                subdir[f"name{j}"] = file
                j += 1
        result.append(subdir)
    

    EDIT

    To ignore folders with no useful files:

    result = []
    i = 0 #track folder id manually
    for path, subdirs, files in os.walk(myDir): 
        subdir = {}
        j = 0 #file counter in subfolder
        for file in files:
            if (file.endswith('.xlsx') or file.endswith('.xls') or file.endswith('.XLS')) and "Release" in file and "Integrated" not in file:
                subdir[f"name{j}"] = file
                j += 1
        if len(subdir) > 0:
            subdir = {"id": i}
            result.append(subdir)
            i += 1 #increase counter