Search code examples
scriptingdosbatch-filewindows-scripting

DOS script to remove identical files by content


I have a directory with several thousand text files that I need to process. Some of these files are identical while others are identical except the timestamp varies by a few seconds / milliseconds. I need some way to automate the deletion of identical files and only keep one copy.

I'm thinking of something like:

while there are files in the directory still
{
    get file                    // e.g., file0001

    while (file == file + 1)    // e.g., file0001 == file0002 using 'fc' command
    {
        delete file + 1
    }

    move file to another directory
}

Is something like this even possible in Microsoft Windows Server 2003's DOS?


Solution

  • Of course it is. Everything is possible in batch. :D

    This batch doesn't actually delete files. It just echos the result of the comparison. You can delete either one of the files if you find two that are the same.

    Save the code as CleanDuplicates.bat and start the program with CleanDuplicates {Folder}

    Provided AS IS, without any guarantees! I don't want you knocking on my door because your entire server is messed up. ;-)

    The code actually calls itself recursively. This could maybe be done in a different way but hey, it works. It also starts itself again in a new cmd, because that makes cleaning up easier. I tested the script in Windows Vista Business, but it should work on Server 2003 as well. Hey, it even has a help function. ;-) It contains two loops that each return every file, so when you implement the actual deleting, it may report that some files don't exist, because they are deleted in an earlier iteration.

    @echo off
    rem Check input. //, /// and //// are special parameters. No parameter -> help.
    if %1check==//check goto innerloop
    if %1check==///check goto compare
    if %1check==////check goto shell
    if %1check==/hcheck goto help
    if %1check==check goto help
    
    rem Start ourselves within a new cmd shell. This will automatically return to
    rem the original directory, and clear our helper environment vars.
    cmd /c %0 //// %1
    echo Exiting
    goto end
    
    :shell
    rem Save the current folder, jump to target folder and set some helper vars
    set FCOrgFolder=%CD%
    cd %2
    set FCStartPath=%0
    if not exist %FCStartPath% set FCStartPath=%FCOrgFolder%\%0
    
    rem Outer loop. Get each file and call ourselves with the first special parameter.
    for %%a in (*.*) do call %FCStartPath% // "%2" "%%a"
    
    goto end
    
    :innerloop
    rem Get each file again and call ourselves again with the second special parameter.
    for %%b in (*.*) do call %FCStartPath% /// %2 %3 "%%b"
    goto end
    
    :compare
    rem Actual compare and some verbose.
    if %3==%4 goto end
    echo Comparing
    echo * %3
    echo * %4
    
    fc %3 %4 >nul
    
    rem Get results
    if errorlevel 2 goto notexists
    if errorlevel 1 goto different
    
    echo Files are identical
    goto end
    
    :different
    echo Files differ
    goto end
    
    :notexists
    echo File does not exist
    goto end
    
    :help
    
    echo Compares files within a directory.
    echo Usage: %0 {directory}
    goto end
    
    :end