What is the most efficent way to get path of subfolders which contain files. For example, if this is my input structure.
inputFolder
│
└───subFolder1
│ │
│ └───subfolder11
│ │ file1.jpg
│ │ file2.jpg
│ │ ...
│
└───folder2
│ file021.jpg
│ file022.jpg
If I pass getFolders(inputPath)
,
it should return the output as a list of folders containig images ['inputFolder/subFolder1/subFolder11','inputFolder/folder2']
Currently I'm making use of my library TreeHandler, which is just a wrapper of os.walk
to get all the files.
import os
from treeHandler import treeHandler
th=treeHandler()
tempImageList=th.getFiles(path,['jpg'])
### basically tempImageList will be list of path of all files with '.jpg' extension
### now is the filtering part,the line which requires optimisation.
subFolderList=list(set(list(map(lambda x:os.path.join(*x.split('/')[:-1]),tempImageList))))
I think it can be done more efficiently.
Thanks in advance
Finding the index of the last instance of '/' and slicing works much faster.
def remove_tail(path):
index = path.rfind('/') # returns index of last appearance of '/' or -1 if not present
return (path[:index] if index != -1 else '.') # return . for parent directory
.
.
.
subFolderList = list(set([remove_tail(path) for path in tempImageList]))
Verified on AWA2 dataset folders (50 folders and 37,322 images).
Adding the code used for verification
import os
from treeHandler import treeHandler
import time
def remove_tail(path):
index = path.rfind('/')
return (path[:index] if index != -1 else '.')
th=treeHandler()
tempImageList= th.getFiles('JPEGImages',['jpg'])
tempImageList = tempImageList
### basically tempImageList will be list of path of all files with '.jpg' extension
### now is the filtering part,the line which requires optimisation.
print(len(tempImageList))
start = time.time()
originalSubFolderList=list(set(list(map(lambda x:os.path.join(*x.split('/')[:-1]),tempImageList))))
print("Current method takes", time.time() - start)
start = time.time()
newSubFolderList = list(set([remove_tail(path) for path in tempImageList]))
print("New method takes", time.time() - start)
print("Is outputs matching: ", originalSubFolderList == newSubFolderList)