Search code examples
windowsbatch-filecommand-linedirectoryfindstr

How to use findstr to remove lines from a file that match strings found in a list file?


I am trying to use findstr to delete lines that match the search strings found on another file. This is what I have been trying to use but it does not seem to work.

dir %ProjectDir%TypeScript\*.ts /b /s > Files.txt
findstr /v /i /g:%ProjectDir%TypeScript\strictFiles.txt Files.txt > tsFiles.txt

Edit This also does not seem to work:

dir %ProjectDir%TypeScript\*.ts /b /s | findstr /v /i /g:%ProjectDir%TypeScript\strictFiles.txt > tsFiles.txt

Solution

  • The short and incomplete answer is that you missed to specify the /L switch of findstr, which forces to do literal searches. Without it, the first search string determines whether literal search or regular expression mode is chosen. Since there are file names included in the search strings, which hold a period to separate the base name from the name extension, which is also a meta-character in regular expression mode, findstr selects that mode most probably.

    In addition, you should also provide the /X switch to not filter out wrong items. For example, a path like D:\Data\some would also match D:\Data\some\file.ext when the /X option is missing.


    The long and comprehensive answer regards the fact that findstr does not make life that easy.

    Let us assume the command line...:

    dir /S /B /A:-D "D:\Project\TypeScript\*.ts" > "Files.txt"
    

    ...produces a list of file paths in Files.txt like this,...:

    D:\Project\TypeScript\sample.ts
    D:\Project\TypeScript\restricted.ts
    D:\Project\TypeScript\excluded.ts
    D:\Project\TypeScript\not-excluded.ts
    D:\Project\TypeScript\ancillary.ts
    D:\Project\TypeScript\[special].ts
    D:\Project\TypeScript\data\test.ts
    D:\Project\TypeScript\data\confidential.ts
    D:\Project\TypeScript\data\arbitrary.ts
    D:\Project\TypeScript\data\.config.ts
    D:\Project\TypeScript\data\other.config.ts
    D:\Project\TypeScript\data.config.ts
    D:\Project\TypeScript\conf.ts\wrong.ts
    

    ...and the file strictFiles.txt contains this...:

    D:\Project\TypeScript\restricted.ts
    D:\Project\TypeScript\excluded.ts
    D:\Project\TypeScript\[special].ts
    D:\Project\TypeScript\confidential.ts
    D:\Project\TypeScript\data\.config.ts
    D:\Project\TypeScript\conf.ts
    

    ...to be filtered out from Files.txt.

    You would expect the command line...:

    findstr /L /X /I /V /G:"strictFiles.txt" "Files.txt" > "tsFiles.txt"
    

    ...to return this in the output file tsFiles.txt,...:

    D:\Project\TypeScript\sample.ts
    D:\Project\TypeScript\not-excluded.ts
    D:\Project\TypeScript\ancillary.ts
    D:\Project\TypeScript\data\test.ts
    D:\Project\TypeScript\data\confidential.ts
    D:\Project\TypeScript\data\arbitrary.ts
    D:\Project\TypeScript\data\other.config.ts
    D:\Project\TypeScript\data.config.ts
    D:\Project\TypeScript\conf.ts\wrong.ts
    

    ...but it actually writes:

    D:\Project\TypeScript\sample.ts
    D:\Project\TypeScript\not-excluded.ts
    D:\Project\TypeScript\ancillary.ts
    D:\Project\TypeScript\[special].ts
    D:\Project\TypeScript\data\test.ts
    D:\Project\TypeScript\data\confidential.ts
    D:\Project\TypeScript\data\arbitrary.ts
    D:\Project\TypeScript\data\.config.ts
    D:\Project\TypeScript\data\other.config.ts
    D:\Project\TypeScript\conf.ts\wrong.ts
    

    The reason for this is that findstr, although in literal search mode due to the /L option, still detects meta-characters for the regular expression mode and allows to escape them by preceding with \. The period . and the opening bracket [ in the above sample content of strictFiles.txt are such meta-characters, and both are preceded by the path separator \, so they are considered as escaped and are therefore interpreted as . and [, or, in other words, the preceding \ becomes dismissed.

    To work around that, you need to escape every \ in strictFiles.txt by preceding with another \, in order to avoid meta-characters to appear escaped to findstr -- see this script for a possible way:

    @echo off
    setlocal EnableExtensions DisableDelayedExpansion
    
    rem // Define constants here:
    set "_ROOT=D:\Project\TypeScript"    & rem // (path of root directory)
    set "_MASK=*.ts"                     & rem // (file search pattern)
    set "_LIST=.\Files.txt"              & rem // (path to file list)
    set "_EXCL=.\strictFiles.txt"        & rem // (path to exclusion list)
    set "_TEMP=%TEMP%\%~n0_%RANDOM%.tmp" & rem // (temporary exclusion list)
    set "_FILT=.\tsFiles.txt"            & rem // (path to filtered file list)
    if not defined _FILT set "_FILT=con"
    
    rem // Generate list of files:
    dir /S /B /A:-D "%_ROOT%\%_MASK%" > "%_LIST%"
    
    rem // Modify exclusion list:
    rem /* replace every path separator `\` by an escaped one `\\`,
    rem    so no other characters can appear escaped to `findstr`: */
    > "%_TEMP%" (
        for /F "usebackq delims= eol=|" %%F in ("%_EXCL%") do (
            set "FILE=%%F"
            setlocal EnableDelayedExpansion
            echo(!FILE:\=\\!
            endlocal
        )
    )
    
    rem // Filter out files that occur in modified exclusion list:
    findstr /L /X /V /I /G:"%_TEMP%" "%_LIST%" > "%_FILT%"
    
    rem // Clean up temporary files:
    del "%_LIST%" "%_TEMP%"
    
    endlocal
    exit /B
    

    If your exclusion list, say strictFileNames.txt this time, holds pure file names rather than full file paths, like for example,...:

    restricted.ts
    excluded.ts
    [special].ts
    confidential.ts
    .config.ts
    conf.ts
    

    ...the approach is slightly different, because only the last path element of the file list Files.txt is to be taken into account. To achieve this, you need to precede every file name of the exclusion list by a path separator, again an escaped one like \\ for the aforementioned reason, in order to avoid wrong matches; for instance, file.ext would match both D:\Data\file.ext and D:\Data\X-file.ext, but \file.ext would match the former only, given that the /X option is replaced by /E this time.

    Here is a script which accomplishes that:

    @echo off
    setlocal EnableExtensions DisableDelayedExpansion
    
    rem // Define constants here:
    set "_ROOT=D:\Project\TypeScript"    & rem // (path of root directory)
    set "_MASK=*.ts"                     & rem // (file search pattern)
    set "_LIST=.\Files.txt"              & rem // (path to file list)
    set "_EXCL=.\strictFileNames.txt"    & rem // (path to exclusion list)
    set "_TEMP=%TEMP%\%~n0_%RANDOM%.tmp" & rem // (temporary exclusion list)
    set "_FILT=.\tsFiles.txt"            & rem // (path to filtered file list)
    if not defined _FILT set "_FILT=con"
    
    rem // Generate list of files:
    dir /S /B /A:-D "%_ROOT%\%_MASK%" > "%_LIST%"
    
    rem // Modify exclusion list:
    rem /* precede every file with an escaped path separator `\\`,
    rem    so no other characters can appear escaped to `findstr`: */
    > "%_TEMP%" (
        for /F "usebackq delims= eol=|" %%F in ("%_EXCL%") do (
            echo(\\%%F
        )
    )
    
    rem // Filter out files that occur in modified exclusion list:
    findstr /L /E /V /I /G:"%_TEMP%" "%_LIST%" > "%_FILT%"
    
    rem // Clean up temporary files:
    del "%_LIST%" "%_TEMP%"
    
    endlocal
    exit /B
    

    All of the above sample file contents are chosen so that you can easily play around with them and see the differences when using the options /X or /E and when doubling the path separators \ or not.