Search code examples
pythonregexfile-rename

File renaming - match patterns and rename only matches. After delete all mismatches


I am trying to rename several files (thousands) with the following algorithm in Python: There is a list of word-patterns. Each entry starts on a new line and contains text data of one or more words. For example:

    r'House',
    r'Big Tree',
    r'Word',

Next, the script takes the names of the files in the folder and compares each pattern (Text Data String) and looks for matches in the file name. If the script finds one or more matches, it removes everything except them (and the extension) from the file name. And the remaining matches are renamed according to the pattern with which the comparison was made.

The search ignores case, special characters and different delimiters like '-', '_', ',' (of which there can be more than one). When searching for several words ('Big Tree'), all possible combinations like: __big_tree_; big,tree_ etc.

Here are the original file names for an example:

  • __big_tree_and_5_more_house_and_1_more_drawn_by_a_n_name__f95c144ac5d5cf0a0baa24d6593cc699.jpg
  • Table-Home-Word-7325313.jpeg
  • kiwi 40055619 virtual YouTube,Search EN, 100+ bookmarks 115480616_p0.png

The output should be:

  • Big Tree House.jpeg
  • Word.jpeg
  • kiwi 40055619 virtual YouTube,Search EN, 100+ bookmarks 115480616_p0.png

Since the last file did not have a name match, it was not renamed

I am new to Python and can't figure out how to implement a similar task.

All the solutions found on the internet mostly work with a single pattern, whereas I will have thousands of them. Ideally, I am trying to implement a variant in which the text database will be stored separately and separately a universal regular expression that allows to search for matches.

Thank you all in advance.

After much trial and error, the best I could do was the following code: Unfortunately it is not fully functional and due to moving files to a temporary folder it duplicates files when restarted

import os
import re
import shutil

# Patterns that will be checked for matches in file names
patterns = [
    r'House',
    r'Bit Tree',
    r'Word',
    r'Word2',
    r'Home',
]

# Filename filtering
def filter_filename(filename):
    matches = []
    for pattern in patterns:
        match = re.search(pattern, filename)
        if match:
            matches.append(match.group())
    return ' '.join(matches)

# File Renaming sending them to a temporary directory
def rename_files(directory):
    temp_dir = os.path.join(directory, 'temp')
    os.makedirs(temp_dir, exist_ok=True)
    for filename in os.listdir(directory):
        old_filepath = os.path.join(directory, filename)
        if os.path.isfile(old_filepath):
            base_filename, file_extension = os.path.splitext(filename)
            new_filename = filter_filename(base_filename)
            if new_filename:
                new_filename = new_filename
                new_filepath = os.path.join(temp_dir, new_filename)
                # Adding a one for uniqueness of the name
                count = 1
                while os.path.exists(new_filepath):
                    if count < 9999:
                        new_filename_with_count = f"{new_filename} {count}{file_extension}"
                        new_filepath = os.path.join(temp_dir, new_filename_with_count)
                        count += 1
                    else:
                        break
                shutil.move(old_filepath, new_filepath)
    # Returning from a temporary folder
    for filename in os.listdir(temp_dir):
        temp_filepath = os.path.join(temp_dir, filename)
        new_filepath = os.path.join(directory, filename)
        shutil.move(temp_filepath, new_filepath)
    os.rmdir(temp_dir)

# Directory for processing
directory = './.'

rename_files(directory)


Solution

  • Here's a simpler solution, which leaves out some of the extra work you're doing (like creating a target directory, temp files, etc.) and which focuses on the following:

    • list all files in a source location
    • for each file, check if the name has any of the key phrases
    • in checking, allow any whitespace in a key phrase to be replaced by any combination of -, _, , or whitespace (of any length > 0), and be case-insensitive
    • move matching files from the source location to a destination location

    The script doesn't currently actually move the files, but just prints what it would be moving, so you can verify it works.

    import re
    import os
    import shutil
    
    
    def move_matching_files(source, destination, key_phrases, log=True, move=False):
        # replace whitespace in the keywords with regex pattern
        # `patterns` will be the patterns for the key_phrases
        patterns = [r'[_\-,\s]+'.join(key_phrase.split()) for key_phrase in key_phrases]
    
        for file_name in os.listdir(source):
            for pattern in patterns:
                # whole pattern: anything followed by a key phrase, followed by anything
                if re.match(f'.*{pattern}.*', file_name, re.IGNORECASE):
                    # when matched, a move needs to be executed, break from the search
                    break
            else:
                # if no phrase matches, don't move, continue with the next file_name
                continue
            source_fn = os.path.join(source, file_name)
            destination_fn = os.path.join(destination, file_name)
            if log:
                print(f'Moving: {source_fn} to {destination_fn}')
            if move:
                shutil.move(source_fn, destination_fn)
    
    
    move_matching_files('.', 'temp', ['house', 'big tree'])
    

    If you don't want it to print, and do want it to move, you'd call it like:

    move_matching_files('.', 'temp', ['house', 'big tree'], log=False, move=True)