Is this a good approach? Is there a more efficient way to do it (without having to trade code readability for efficiency) ?
for root, dirs, files in os.walk(path, topdown=False):
for name in files:
if re.match(r'.*\.mp3', name):
yield os.path.join(root, name) # returns the path of the .mp3 file
EDIT: Conclusion:
If you ignore recursion, the fastest way to do it is by using the glob
module. If you want recursion, switching from re.match()
to using slices makes it few milliseconds faster.
A Python-based recursive directory walker should definitely include os.walk
, that is the right choice. However, I would check for the extension using os.path.splitext()
instead of using regex. return
is not what you want here I guess, it terminates the iteration when hitting the first mp3 file. Replace it with yield
. This creates a generator function. Call it from the outside, and you can easily iterate through all mp3 files in your directory tree.
A working solution, test.py
:
import os
def mp3gen():
for root, dirs, files in os.walk('.'):
for filename in files:
if os.path.splitext(filename)[1] == ".mp3":
yield os.path.join(root, filename)
for mp3file in mp3gen():
print mp3file
Test:
$ mkdir testenv
$ cd testenv
$ mkdir subdir
$ touch test.mp3
$ touch subdir/test2.mp3
$ touch foo.mp4
$ python test.py
./test.mp3
./subdir/test2.mp3
By the way, whatever you do, it is unlikely that the performance of this iteration is the bottleneck in your workflow. If it is, I would actually use the find
utility using find . -name "*.mp3"
, and then pipe its output to your Python script, then read the items from stdin using for line in sys.stdin
.