Search code examples
batch-filesequential

Find missing files from sequentially numbered files in a directory


I have about 300 000 files in a directory. They are sequentially numbered - x000001, x000002, ..., x300000. But some of these files are missing and I need to write an output text file containing the missing file numbers. The following code does it only up to 10 000 files:

@echo off
setlocal enabledelayedexpansion
set "log=%cd%\logfile.txt"
for /f "delims=" %%a in ('dir /ad /b /s') do (
 pushd "%%a"
  for /L %%b in (10000,1,19999) do (
   set a=%%b
   set a=!a:~-4!
   if not exist "*!a!.csv" >>"%log%" echo "%%a - *!a!.csv"
  )
 popd
)

How to extend it to 3 * 10^5 files?


Solution

  • Solution 1 - simple but slow

    If all 300000 CSV files are in current directory on executing the batch file, this batch code would do the job.

    @echo off
    set "log=%cd%\logfile.txt"
    del "%log%" 2>nul
    for /L %%N in (1,1,9) do if not exist *00000%%N.csv echo %%N - *00000%%N.csv>>"%log%"
    for /L %%N in (10,1,99) do if not exist *0000%%N.csv echo %%N - *0000%%N.csv>>"%log%"
    for /L %%N in (100,1,999) do if not exist *000%%N.csv echo %%N - *000%%N.csv>>"%log%"
    for /L %%N in (1000,1,9999) do if not exist *00%%N.csv echo %%N - *00%%N.csv>>"%log%"
    for /L %%N in (10000,1,99999) do if not exist *0%%N.csv echo %%N - *0%%N.csv>>"%log%"
    for /L %%N in (100000,1,300000) do if not exist *%%N.csv echo %%N - *%%N.csv>>"%log%"
    set "log="
    

    Solution 2 - faster but more difficult to understand

    This second solution is definitely much faster than above as it processes the list of file names in current directory from first file name to last file name.

    In case of last file is not x300000.csv, the batch code below just writes one more line into the log file with the information from which number to expected end number 300000 files are missing in current directory.

    @echo off
    setlocal EnableExtensions EnableDelayedExpansion
    
    rem Delete log file before running file check.
    set "log=%cd%\logfile.txt"
    del "%log%" 2>nul
    
    rem Define initial value for the number in the file names.
    set "Number=0"
    
    rem Define the file extension of the files.
    set "Ext=.csv"
    
    rem Define beginning of first file name with number 1.
    set "Name=x00000"
    
    rem Define position of dot separating name from extension.
    set "DotPos=7"
    
    rem Process list of files matching the pattern of fixed length in current
    rem directory sorted by file name line by line. Each file name is compared
    rem case-sensitive with the expected file name according to current number.
    rem A subroutine is called if current file name is not equal expected one.
    for /F "delims=" %%F in ('dir /B /ON x??????%Ext% 2^>nul') do (
        set /A Number+=1
        if "!Name!!Number!%Ext%" NEQ "%%F" call :CheckDiff "%%F"
    )
    
    rem Has last file not expected number 300000, log the file numbers
    rem of the files missing in current directory with a single line.
    if "%Number%" NEQ "300000" (
        set /A Number+=1
        echo All files from number !Number! to 300000 are also missing.>>"%log%"
    )
    endlocal
    
    rem Exit this batch file to jump to predefined label EOF (End Of File).
    goto :EOF
    
    rem This is a subroutine called from main loop whenever current file name
    rem does not match with expected file name. There are two reasons possible
    rem with file names being in expected format:
    
    rem 1. One leading zero must be removed from variable "Name" as number
    rem    has increased to next higher power of 10, i.e. from 1-9 to 10,
    rem    from 10-99 to 100, etc.
    
    rem 2. The next file name has really a different number as expected
    rem    which means there are one or even more files missing in list.
    
    rem The first reason is checked by testing if the dot separating name
    rem and extension is at correct position. One zero from end of string
    rem of variable "Name" is removed if this is the case and then the
    rem new expected file name is compared with the current file name.
    
    rem Is the perhaps newly determined expected file name still not
    rem equal the current file name, the expected file name is written
    rem into the log file because this file is missing in list.
    
    rem There can be even more files missing up to current file name. Therefore
    rem the number is increased and entire subroutine is executed once more as
    rem long as expected file name is not equal the current file name.
    
    rem The subroutine is exited with goto :EOF if the expected file name
    rem is equal the current file name resulting in continuing in main
    rem loop above with checking next file name from directory listing.
    
    :CheckDiff
    set "Expected=%Name%%Number%%Ext%"
    if "!Expected:~%DotPos%,1!" NEQ "." (
        set "Name=%Name:~0,-1%"
        set "Expected=!Name!%Number%%Ext%"
    )
    if "%Expected%" EQU %1 goto :EOF
    echo %Expected%>>"%log%"
    set /A Number+=1
    goto CheckDiff
    

    For understanding the used commands in both solutions and how they work, open a command prompt window, execute there the following commands, and read entirely all help pages displayed for each command very carefully.

    • call /?
    • dir /?
    • echo /?
    • endlocal /?
    • for /?
    • if /?
    • goto /?
    • rem /?
    • set /?
    • setlocal /?