Search code examples
pythondirectorytraversal

Pythonic way to find the file with a given name closest to a specific directory location


I'm currently in a project where I'm essentially trying to create a tree structure based on a number of scattered xml files that are, sadly, not very consistently organized. Specifically the point I'm at now is that given a number of files with a given file extension I want to be able to find the xml document that dictates their layout. Luckily the document always has the same name, but sadly the document isn't always in the same location relative to the media files I'm trying to link it to. The most sensible workaround I've found is looking for the closest file with a similar name in the directory structure. However, the only way I've managed to do this in Python is by going up directories and looking for the file in consideration by using os.walk. Sadly, this is pretty slow and I would like to be able to do this for a large number media files so I'm looking for a more elegant solution. Below is some example code showing my present approach:

from os import listdir
from os.path import isfile, join, realpath

current_directory = "/path/to/example.mp3"
all_files = lambda path: [file for file in listdir(path) if isfile(join(path,file))]

filename = "test.xml"
found = False
while found is False:
    current_directory = current_directory[:current_directory.rfind("/")]
    current_files = all_files(current_directory)
    if filename in current_files:
        return current_files[current_files.index(filename)]

The directory structure isn't so bad that the above method will ever reach two file instances at once, but I still feel like the above method is not very pythonic and is a lot more convoluted than it really needs to be. Any ideas?


Solution

  • os.walk is intelligent: when topdown is True, you can edit dirnames to specify which subdirectories to check.

    Using it, possibly with a state machine of a kind, will immediately make your code neater - ther'll be no need for listdir, allfiles or rfind hackery.

    There's no recursive tree search in your code so there really is no need for os.walk(). If I get you right, your code checks the current dir for an exact name, then all the way upwards the FS.

    path = os.path.dirname("/path/to/file.mp3")
    target = "test.xml"
    top = "/"
    while True:
        if os.path.isfile(os.path.join(path,target)):
            #found
            break
        if path==top:   #alternative check for root dir: if os.path.dirname(path)==path
            #not found
            break    
        path=os.path.dirname(path)
    

    An alternative way is to use a generator that yields parent dirs but that seems overcomplicated to me. Albeit this is probably more pythonic:

    def walk_up(path,top):
        while True:
            yield path
            if path==top: raise StopIteration
            else: path=os.path.dirname(path)
    
    found = None
    for p in walk_up(os.path.dirname("/path/to/file.mp3"),"/"):
       p = os.path.join(p,target)
       if os.path.isfile(p):
          #found
          found = p
          break
    else:
        #not found