I have a folder with 12500 pictures. The filenames contain the numbers, so it looks like:
0.jpg
1.jpg
2.jpg
3.jpg
.
.
.12499.jpg
Now I want to move the files. Files with range 0-7999 should be copied to the first folder. Files 8000-9999 should be copied to the second folder and files with range 10000-12499 should be copied to the third folder.
First, I thought I could easily use [0-7999].jpg for the first folder, [8000-9999].jpg for the second and [10000-12499].jpg for the third. However, this does not work. I figured out the following code, based on the wildcards I know which are ? and *: The following code does work and does the job (please note that I commented out the shutil.copy, instead use print to check the result):
import glob
import shutil
dest_dir = "/tmp/folder1/"
for file in glob.glob('/tmp/source/?.jpg'):
#shutil.copy(file, dest_dir)
print(file)
dest_dir = "/tmp/folder1/"
for file in glob.glob('/tmp/source/??.jpg'):
#shutil.copy(file, dest_dir)
print(file)
dest_dir = "/tmp/folder1/"
for file in glob.glob('/tmp/source/???.jpg'):
#shutil.copy(file, dest_dir)
print(file)
dest_dir = "/tmp/folder1/"
for file in glob.glob('/tmp/source/[1-7]???.jpg'):
#shutil.copy(file, dest_dir)
print(file)
dest_dir = "/tmp/folder2/"
for file in glob.glob('/tmp/source/[8-9]???.jpg'):
#shutil.copy(file, dest_dir)
print(file)
dest_dir = "/tmp/folder3/"
for file in glob.glob('/tmp/source/?????.jpg'):
#shutil.copy(file, dest_dir)
print(file)
However, I would like to have an elegant solution for this. I googled regular expression with integer range and tried the following:
dest_dir = "/tmp/folder3/"
for file in glob.glob('/tmp/source/\b([0-9]|[1-9][0-9]|[1-9][0-9][0-9]|1000).jpg'):
#shutil.copy(file, dest_dir)
print(file)
This does not work. So how does a correct implementation look like? I need a solution for both, shutil.copy and shutil.move, but I think it is the same for both. I need to have a regular expression solution here for this, as I expect this would be just one line of code to solve this issue. If one only knew the correct expression for this. I do not want to iterate through it with extracting the numbers/values itself, like in this solution (or any other solution avoiding finding the correct regular expression). So my problem really is about the regular expression.
glob
regex doesn't support alternation pipe symbol (|
), like you used, it's better to use some regex pattern (re
) to create your desired file list on one line and then iterate over it. you have 3 range, so you need 3 for loop to do this! one of them using your mentioned regex will be as follow:
import re
import glob
dest_dir = "/tmp/folder3/"
for file in [f for f in glob.glob("/tmp/source/*.jpg") if re.search(r'([0-9]|[1-9][0-9]|[1-9][0-9][0-9]|1000)\.jpg', f)]:
#shutil.copy(file, dest_dir)
print(file)