Search code examples
windowspowershellbatch-filevbscriptbatch-rename

Script to compare two different folder contents and rename them based on minimum similarity


Story: I have multiple folders with 1000+ files in each that are named similar to each other but are slightly different but they relate to the same content. For example, in one folder I have files named quite simply "Jobs to do.doc" and in another folder "Jobs to do (UK) (Europe).doc" etc.

This is on Windows 10, not Linux.

Question: Is there a script to compare each folder's content and rename them based on minimum similarity? So the end result would be to remove all the jargon and have each file in each folder (multiple) the same as one another but STILL remain in the retrospective folder? *Basically compare multiple folder content to one folders contents and rename them so each file in each folder is named the same?

Example:

D:/Folder1/Name_Of_File1.jpeg
D:/Folder2/Name_Of_File1 (Europe).jpeg
D:/Folder3/Name_of_File1_(Random).jpeg

D:/folder1/another_file.doc
D:/Folder2/another_file_(date_month_year).txt
D:/Folder3/another_file(UK).XML

I have used different file extensions in the above example in hope someone can write a script to ignore file extensions.

I hope this make sense. So either a script to remove the content in brackets and keep the files integrity or rename ALL files across all folders based on minimum similarity.

The problem is its 1000+ files in each folder so want to run it as an automated job.

Thanks in advance.


Solution

  • If the stuff you want to get rid of is always in brackets then you could write a regex like (.*?)([\s|_|]*\(.*\))

    Try something like this

    $folder = Get-ChildItem 'C:\TestFolder'
    $regex = '(.*?)([\s|_|]*\(.*\))'
    foreach ($file in $folder){
        if ($file.BaseName -match $regex){
            Rename-Item -Path $file.FullName -NewName "$($matches[1])$($file.extension)" -Verbose #-WhatIf
        }
    }
    

    Regarding consistency you could run a precheck using same regex

    #change each filename if it matches regex and store only it's new basename
    $folder1 = get-childitem 'D:\T1' | foreach {if ($_.BaseName -match $regex){$matches[1]}else{$_.BaseName}} 
    $folder2 = get-childitem 'D:\T2' | foreach {if ($_.BaseName -match $regex){$matches[1]}else{$_.BaseName}}
    #compare basenames in two folders - if all are the same nothing will be returned 
    Compare-Object $folder1 $folder2
    

    Maybe you could build with that idea.