Search code examples
pythonregexpathmp3

Is this an efficient way of listing all .mp3 files inside in a directory (including eventual subdirectories) in Python?


Is this a good approach? Is there a more efficient way to do it (without having to trade code readability for efficiency) ?

for root, dirs, files in os.walk(path, topdown=False):
    for name in files:
        if re.match(r'.*\.mp3', name):
            yield os.path.join(root, name) # returns the path of the .mp3 file

EDIT: Conclusion:

If you ignore recursion, the fastest way to do it is by using the glob module. If you want recursion, switching from re.match() to using slices makes it few milliseconds faster.


Solution

  • A Python-based recursive directory walker should definitely include os.walk, that is the right choice. However, I would check for the extension using os.path.splitext() instead of using regex. return is not what you want here I guess, it terminates the iteration when hitting the first mp3 file. Replace it with yield. This creates a generator function. Call it from the outside, and you can easily iterate through all mp3 files in your directory tree.

    A working solution, test.py:

    import os
    
    def mp3gen():
        for root, dirs, files in os.walk('.'):
            for filename in files:
                if os.path.splitext(filename)[1] == ".mp3":
                    yield os.path.join(root, filename)
    
    for mp3file in mp3gen():
        print mp3file
    

    Test:

    $ mkdir testenv
    $ cd testenv
    $ mkdir subdir
    $ touch test.mp3
    $ touch subdir/test2.mp3
    $ touch foo.mp4
    $ python test.py
    ./test.mp3
    ./subdir/test2.mp3
    

    By the way, whatever you do, it is unlikely that the performance of this iteration is the bottleneck in your workflow. If it is, I would actually use the find utility using find . -name "*.mp3", and then pipe its output to your Python script, then read the items from stdin using for line in sys.stdin.