Quick question regarding globbing in python.
I have a directory of files that go 'sync_0001.tif', 'sync_0002.tif', ... , 'sync_2400.tif'. I'd like to obtain 3 subset lists of those files: 1 for the first 800 files, second 800 files, and last 800 files. The only problem is the 0's before the numbers. I can't figure out the right way to glob and obtain those lists. The third list is easy because there are no 0's padding any of those files (s3=glob.glob('sync_[1601-2400].tif'). The other two are trickier because the number of 0's out front varies.
I tried this, but got 'bad character range,' I'm guessing because of the 0's:
s1 = glob.glob('sync_' + '{[0001-0009], [0010-0099], [0100-0800]}' + '.tif')
s2 = glob.glob('sync_' + '{[0801-0999], [1000-1600]}' + '.tif')
I then tried moving the 0's out front like so, but got an empty list:
s1 = glob.glob('sync_' + '{000[1-9], 00[10-99], 0[100-800]}' + '.tif')
What's the best way to achieve these three lists? I'm starting to think I have the whole glob thing wrong, so if someone could shed some light that would be great. Thanks!
The fnmatch
module underpinning the glob.glob()
function is not nearly sophisticated enough for your task.
Just grab all filenames and partition them after sorting:
filenames = sorted(glob.glob('sync_[0-9][0-9][0-9][0-9].tif'))
This works because your numbers are padded and can thus be sorted lexicographically. Then partition them:
s1 = [f for f in filenames if 0 < int(f[5:9]) <= 800]
s2 = [f for f in filenames if 800 < int(f[5:9]) <= 1600]
s3 = [f for f in filenames if 1600 < int(f[5:9]) <= 2400]
The directory I/O will be the slowest here anyway. You can make this all a little more efficient by looping just once and swapping what you append to:
target = s1 = []
s2 = []
s3 = []
for f in filenames:
num = int(f[5:9])
if num > 800:
target = s2
elif num > 1600:
target = s3
target.append(f)
but for a task like this sticking to the simpler list comprehensions is just fine too.