Search code examples
pythonregexglob

Regular expression and Python glob


I have a folder with 12500 pictures. The filenames contain the numbers, so it looks like:

0.jpg
1.jpg
2.jpg
3.jpg
.
.
.12499.jpg

Now I want to move the files. Files with range 0-7999 should be copied to the first folder. Files 8000-9999 should be copied to the second folder and files with range 10000-12499 should be copied to the third folder.

First, I thought I could easily use [0-7999].jpg for the first folder, [8000-9999].jpg for the second and [10000-12499].jpg for the third. However, this does not work. I figured out the following code, based on the wildcards I know which are ? and *: The following code does work and does the job (please note that I commented out the shutil.copy, instead use print to check the result):

import glob
import shutil
dest_dir = "/tmp/folder1/"
for file in glob.glob('/tmp/source/?.jpg'):
    #shutil.copy(file, dest_dir)
    print(file)

dest_dir = "/tmp/folder1/"
for file in glob.glob('/tmp/source/??.jpg'):
    #shutil.copy(file, dest_dir)
    print(file)

dest_dir = "/tmp/folder1/"
for file in glob.glob('/tmp/source/???.jpg'):
    #shutil.copy(file, dest_dir)
    print(file)

dest_dir = "/tmp/folder1/"
for file in glob.glob('/tmp/source/[1-7]???.jpg'):
    #shutil.copy(file, dest_dir)
    print(file)

dest_dir = "/tmp/folder2/"
for file in glob.glob('/tmp/source/[8-9]???.jpg'):
    #shutil.copy(file, dest_dir)
    print(file)

dest_dir = "/tmp/folder3/"
for file in glob.glob('/tmp/source/?????.jpg'):
    #shutil.copy(file, dest_dir)
    print(file)

However, I would like to have an elegant solution for this. I googled regular expression with integer range and tried the following:

dest_dir = "/tmp/folder3/"
for file in glob.glob('/tmp/source/\b([0-9]|[1-9][0-9]|[1-9][0-9][0-9]|1000).jpg'):
    #shutil.copy(file, dest_dir)
    print(file)

This does not work. So how does a correct implementation look like? I need a solution for both, shutil.copy and shutil.move, but I think it is the same for both. I need to have a regular expression solution here for this, as I expect this would be just one line of code to solve this issue. If one only knew the correct expression for this. I do not want to iterate through it with extracting the numbers/values itself, like in this solution (or any other solution avoiding finding the correct regular expression). So my problem really is about the regular expression.


Solution


  • glob regex doesn't support alternation pipe symbol (|), like you used, it's better to use some regex pattern (re) to create your desired file list on one line and then iterate over it. you have 3 range, so you need 3 for loop to do this! one of them using your mentioned regex will be as follow:

    import re
    import glob
    
    dest_dir = "/tmp/folder3/"
    for file in [f for f in glob.glob("/tmp/source/*.jpg") if re.search(r'([0-9]|[1-9][0-9]|[1-9][0-9][0-9]|1000)\.jpg', f)]:
        #shutil.copy(file, dest_dir)
        print(file)