Is my time and space analysis correct for this algorithm?
def find_files(suffix, path):
if suffix == '':
return []
# Base condition
if len(os.listdir(path)) == 0:
return []
path_elements = os.listdir(path)
path_files = [file for file in path_elements if ('.' + suffix) in file]
path_folders = [folder for folder in path_elements if '.' not in folder]
for folder in path_folders:
path_files.extend(find_files(suffix=suffix, path=path + '/' + folder))
return path_files
# Tests
path_base = os.getcwd() + '/testdir'
print(find_files(suffix='c', path=path_base))
Requirement:
The goal is to write code for finding all files under a directory (and all directories beneath it) that end with ".c"
Summary:
In order to implement this, I have used find_files
function which will take suffix
(file extension) and path
(directory path where we need to search). In this function, I am recursively searching for a file with a given extension in a parent directory and all its sub-directories. I am storing all these files with a given suffix in a list and returning it.
Time and Space Complexity:
Time Complexity:
os.listdir(path) will list all the files and folders in a given path. Since it has to iterate over each item to provide a list, it takes linear time complexity. Let e
be the length of the path_elements => O(len(path_elements))
=> O(e)
path_elements = os.listdir(path)
.c
in the file-name. If s
is a length of the string (name of the file/folder), then it takes O(s)
time to find out a file with the given extension. Therefore, for f
number of files, it takes O(f*s)
time. For practical purposes such as this, it is fair to assume that file-names are NOT infinitely long. Therefore, we can assume s
as a constant => O(f * 1)
=> O(f)
.path_files = [file for file in path_elements if ('.' + suffix) in file]
g
directories will take linear time => O(g)
path_folders = [folder for folder in path_elements if '.' not in folder]
The above 3 steps will take => O(e + f + g)
=> O(e)
. Since, e
(len(path_elements)
) is a dominant term as it is a combination of path_files
and path_folders
k
is the number of directories (branches
) at each level in depth
then the recursion will take O(brancher^depth)
=> O(k^depth)
. But for every recursive call, above three steps are performed. Therefore, total time complexity is O((k^depth) * e
Space Complexity:
Input Space
O(1)
O(1)
O(e + f + g)
=> O(e)
Auxiliary Space
Recursion uses something called as "call stack". When a function calls itself, that function goes on top of the stack.
depth
. Therefore O(depth)
space is required in the call stack.Total Space Complexity => Input Space + Auxiliary Space => O(e + depth)
I think the complexity is O(path_elements), where path_elements
is the total number of files and folders returned by all calls to os.listdir()
. Consider a folder that has four files in a single directory:
folder
zip.c
foo.c
bar.c
bas.c
So your function calls os.listdir()
, which returns four files. So n = 4
. The code then iterates over the list to find folders, of which it finds none, and again to pick out the .c files, which it adds to the list.
The complexity here is O(n) to call os.listdir()
, O(n) to search for folders, and O(n) to pick out the .c files and add them to the list. Altogether O(3*n), which is O(n).
Now consider this directory tree:
folder
sub1
zip.c
foo.c
sub2
bar.c
bas.c
So the first call to find_files
calls os.listdir(folder)
, which returns two folders. Each folder makes a recursive call to find_files
. Altogether there are three calls to os.listdir()
, and each one returns two files, so the total number of items returned by os.listdir()
is 6.
Now imagine you had:
folder
sub0
sub1
sub2
sub3
...
sub30
file1.c
The complexity here is still O(n). In this case, n is 31.
The point I'm trying to make here is that you look at each item returned from os.listdir()
a constant number of times. Thus, O(n).