Search code examples
pythonpython-3.xglob

Using glob to find duplicate filenames with the same number in it


I am currently writing a script that cycles through all the files in a folder and renames them according to a naming convention.

What I would like to achieve is the following; if the script finds 2 files that have the same number in the filename (e.g. '101 test' and '101 real') it will move those two files to a different folder named 'duplicates'.

My original plan was to use glob to cycle through all the files in the folder and add every file containing a certain number to a list. The list would then be checked in length, and if the length exceeded 1 (i.e. there are 2 files with the same number), then the files would be located to this 'duplicates' folder. However for some reason this does not work.

Here is my code, I was hoping someone with more experience than me can give me some insight into how to achieve my goal, Thanks!:

app = askdirectory(parent=root)




for x in range(804):
    listofnames = []
    real = os.path.join(app, '*{}*').format(x)
    for name in glob.glob(real):
        listofnames.append(name)
        y = len(listofnames)
        if y > 1:
            for names in listofnames:
                path = os.path.join(app, names)
                shutil.move(path,app + "/Duplicates")

Solution

  • A simple way is to collect filenames with numbers in a structure like this:

    numbers = {
         101: ['101 test', '101 real'],
         93: ['hugo, 93']
    }
    

    and if a list in this dict is longer than one do the move.

    import re, os
    from collections import defaultdict
    
    app = askdirectory(parent=root)
    # a magic dict
    numbers = defaultdict(list)
    
    # list all files in this dir
    for filename in os.listdir(app):
        # \d+ means a decimal number of any length
        match = re.search('\d+', filename)
    
        if match is None:
            # no digits found
            continue
    
        #extract the number
        number = int(match.group())
    
        # defaultdict magic
        numbers[number].append(filename)
    
    for number, filenames in numbers.items():
        if len(filenames) < 2:
            # not a dupe
            continue
        for filename in filenames:
            shutil.move(os.path.join(app, filename),
                        os.path.join(app, "Duplicates"))
    

    defaultdict magic is just a short hand for the following code:

        if number not in numbers:
            numbers.append(list())
        numbers[number] = filename