Search code examples
pythonglob

globbing a range with padded 0's- python


Quick question regarding globbing in python.

I have a directory of files that go 'sync_0001.tif', 'sync_0002.tif', ... , 'sync_2400.tif'. I'd like to obtain 3 subset lists of those files: 1 for the first 800 files, second 800 files, and last 800 files. The only problem is the 0's before the numbers. I can't figure out the right way to glob and obtain those lists. The third list is easy because there are no 0's padding any of those files (s3=glob.glob('sync_[1601-2400].tif'). The other two are trickier because the number of 0's out front varies.

I tried this, but got 'bad character range,' I'm guessing because of the 0's:

s1 = glob.glob('sync_' + '{[0001-0009], [0010-0099], [0100-0800]}' + '.tif')
s2 = glob.glob('sync_' + '{[0801-0999], [1000-1600]}' + '.tif')

I then tried moving the 0's out front like so, but got an empty list:

s1 = glob.glob('sync_' + '{000[1-9], 00[10-99], 0[100-800]}' + '.tif')

What's the best way to achieve these three lists? I'm starting to think I have the whole glob thing wrong, so if someone could shed some light that would be great. Thanks!


Solution

  • The fnmatch module underpinning the glob.glob() function is not nearly sophisticated enough for your task.

    Just grab all filenames and partition them after sorting:

    filenames = sorted(glob.glob('sync_[0-9][0-9][0-9][0-9].tif'))
    

    This works because your numbers are padded and can thus be sorted lexicographically. Then partition them:

    s1 = [f for f in filenames if 0 < int(f[5:9]) <= 800]
    s2 = [f for f in filenames if 800 < int(f[5:9]) <= 1600]
    s3 = [f for f in filenames if 1600 < int(f[5:9]) <= 2400]
    

    The directory I/O will be the slowest here anyway. You can make this all a little more efficient by looping just once and swapping what you append to:

    target = s1 = []
    s2 = []
    s3 = []
    for f in filenames:
        num = int(f[5:9])
        if num > 800:
            target = s2
        elif num > 1600:
            target = s3
        target.append(f)
    

    but for a task like this sticking to the simpler list comprehensions is just fine too.