Search code examples
batch-filelines

Batch script deleting specific lines in multiple files


I'm looking for a script or a program that can delete specific lines from a text file ( input.001.log.....input.log.1900), the files have 50MB size and I have around 2k files. On every line there is a string, I want to delete every line with double characters "aa" "bb" and so on, also every line with more than 5 numbers, every line with a special character except @ # & and every line with more than 2 special characters ( like a@bcd#38s# this line needs to be deleted)

As a note I don't have any programming skills, just small experience with batch scripting.

So far, I'm using this code:

@ECHO OFF 
SETLOCAL 
FOR %%i IN (input.txt) DO ( 
 TYPE "%%i"|FINDstr /l /v "aa bb cc dd ff gg hh ii jj kk ll mm nn pp qq rr ss tt uu vv xx yy zz" >"input_1.txt" 
) 
GOTO :EOF

Solution

  • This would be easy if batch had a decent regular expression utility, but FINDSTR is extremely limited and buggy. However, FINDSTR can solve this problem rather efficiently without too much difficulty.

    You aren't very clear as to what you mean by "special character". My interpretation is you only want to accept alpha characters a-z and A-Z, digits 0-9, and special characters @, #, and &. I can only guess that you are building a dictionary of potential passwords.

    I find this problem easier if you build environment variables that represent various classes of characters, as well as various logical expressions, and then use the variables within your search string.

    I recommend you write your modified files to a new folder.

    @echo off
    setlocal
    
    set "alpha=abcdefghijklmnopqrstuvwxyz"
    set "num=0123456789"
    set "sym=@#&"
    
    set "dups=aa bb cc dd ee ff gg hh ii jj kk ll mm nn oo pp qq rr ss tt uu vv ww xx yy zz 00 11 22 33 44 55 66 77 88 99 @@ ## &&"
    set "bad=[^%alpha%%num%%sym%]"
    set "num6=[%num%][^%num%]*[%num%][^%num%]*[%num%][^%num%]*[%num%][^%num%]*[%num%][^%num%]*[%num%]"
    set "sym3=[%sym%][^%sym%]*[%sym%][^%sym%]*[%sym%]
    
    set "source=c:\your\source\folder"
    set "destination=c:\your\destination\folder"
    
    for %%F in ("%source%\*.txt") do findstr /riv "%dups% %bad% %num6% %sym3%" "%%F" >"%destination%\%%~nxF"
    

    Edit in response to Magoo's comment

    The solution must be modified a bit if you are running on Windows XP, as that has a regular expression length limit of 127 bytes, and the %num6% expression exceeds that limit.

    The solution should work on XP if you change num6 to

    set "num6=[%num%].*[%num%].*[%num%].*[%num%].*[%num%].*[%num%]"
    

    That search logically gives the same result, but it is significantly less efficient because it may require excessive backtracking during the matching process.