Search code examples
windowsshellbatch-filescriptingfindstr

findstr: search strings too long


I am trying to compare a list of IP's and output the difference by using the findstr command in Windows and am having difficulty getting it to work. The command I am using is:

EDIT: My objective is to compare IP's that were scanned successfully to IP's that were scanned successfully with authentication achieved and output it the files that aren't in IPsSuccessfullyScannedwithAuthentication.txt but are in IPsSuccessfullyScanned.txt to IPsSuccessfullyScannedButNotAuthenticated.txt.

Let's say IPsSuccessfullyScanned.txt contain

192.168.0.1

192.168.0.2

192.168.0.3

192.168.0.4

192.168.0.5

192.168.0.6

192.168.0.7

192.168.0.8-192.168.0.12

and IPsSuccessfullyScannedwithAuthentication.txt (which are the IP's that authenticated and were successfully scanned) contain

192.168.0.1

192.168.0.2

192.168.0.3

192.168.0.4

192.168.0.6

192.168.0.8-192.168.0.10

192.168.0.12

My IPsSuccessfullyScannedButNotAuthenticated.txt should have this:

192.168.0.5

192.168.0.7

192.168.0.11

findstr /vixg:IPsSuccessfullyScanned.txt IPsSuccessfullyScannedwithAuthentication.txt > IPsSuccessfullyScannedButNotAuthenticated.txt

What I am trying to achieve is very similar to this post:

.bat file to compare two text files and output the difference

Here is my issue though, the file size in the IPs2.txt is 720 bytes. When I researched about the findstr command, I found out that when doing a regular expression search, the maximum search string length is 254 bytes. A regular expression with length between 255 bytes and 511 bytes will result in a FINDSTR: Out of memory error with ERRORLEVEL 2.

A regular expression length >511 bytes results in the FINDSTR: Search string too long. error. (which is the error I'm currently getting).

My question is: What alternatives are out there that I can use to be able to compare the two text files? If there are any other suggestions to resolve my issue as easy as possible, even a bat file can help if possible.

References:

http://ss64.com/nt/findstr-escapes.html

What are the undocumented features and limitations of the Windows FINDSTR command?


Solution

  • @ECHO OFF
    SETLOCAL
    SET "sourcedir=U:\sourcedir"
    SET "filename1=%sourcedir%\q35090416.txt"
    SET "filename2=%sourcedir%\q35090416_2.txt"
    :: remove variables starting $
    FOR  /F "delims==" %%a In ('set $ 2^>Nul') DO SET "%%a="
    :: Read first file into memory
    FOR /f "tokens=1*delims=]" %%a IN ('find /n /v "" "%filename1%"') DO SET "$%%a=%%b"
    FOR /f "usebackqdelims=" %%a IN ("%filename2%") DO (
     SET "same="
     FOR /f "tokens=1*delims==" %%b IN ('SET $ 2^>nul') DO (
      IF /i "%%a"=="%%c" SET "same=y"
     )
     IF NOT DEFINED same ECHO(%%a
    )
    
    GOTO :EOF
    

    You would need to change the settings of sourcedir and filename* to suit your circumstances.

    I used files named q35090416*.txt containing some dummy data for my testing.

    The report produced by this routine is lines contained in the second file that are not contained in the first, which appears to be the object of your findstr /vixg command. Please explain clearly what you are wttempting to do - we can't fix something that doesn't work if we don't know precisely what the object is.

    The routine may have problems if any string starts ] or with any of the usual problems encountered with batch string-processing.


    Your edits have changed the problem radically. The issue has much to do with the content of the files. The limit of 254 characters for findstr is a limit per line, not of the file overall.

    @ECHO OFF
    SETLOCAL
    SET "sourcedir=U:\sourcedir"
    SET "filename1=%sourcedir%\q35090416.txt"
    SET "filename2=%sourcedir%\q35090416_2.txt"
    :: remove variables starting $
    FOR  /F "delims==" %%a In ('set $ 2^>Nul') DO SET "%%a="
    :: Read first file into memory
    FOR /f "usebackqtokens=1*delims=-" %%a IN ("%filename2%") DO (
     IF "%%b"=="" (SET "$%%a=Y") ELSE (
      FOR /f "tokens=1-4,8delims=." %%p IN ("%%a.%%b") DO FOR /l %%x IN (%%s,1,%%t) DO SET "$%%p.%%q.%%r.%%x=y"
     )
    )
    FOR /f "usebackqtokens=1*delims=-" %%a IN ("%filename1%") DO (
     IF "%%b"=="" (IF NOT DEFINED $%%a ECHO %%a) ELSE (
      FOR /f "tokens=1-4,8delims=." %%p IN ("%%a.%%b") DO FOR /l %%x IN (%%s,1,%%t) DO (
      IF NOT DEFINED $%%p.%%q.%%r.%%x ECHO %%p.%%q.%%r.%%x
      )
     )
    )
    
    GOTO :EOF
    

    This solution should fit. I use filenames for testing that correspond to the SO question number so that I can revisit the problem if required. Hence filename1 here is your successfully-scanned list, filename2 is the authenticated list and the output is the difference.

    You could enclose the entire second for statement in parentheses to redirect the output to a file if you wish, ie.

    ...
    for ....
    )
    ...
    

    becomes

    ...
    (
    for ....
    )
    )>somefilename
    ...
    

    to redirect the output to somefilename

    The routine works by first removing all environment variables that start with $ (normally there are none, but that makes sure)

    Then it examines the second file, splitting the line into %%a and %%b (the two tokens either side of the delimiter "-") If the second token does not exist, then it sets an environment variable eg "$192.168.0.1". If it does exist then it re-tokenises "%%a.%%b" (note .) and assigns the 1-4 and 8th token, so "192.168.0.8-192.168.0.10" becomes "192.168.0.8" and "192.168.0.10"; this is pot together as "192.168.0.8.192.168.0.10" and the tokens 192,168,0,8 and 10 are assigned to %%p..%%t. The for /l loop then assigns a value to $192.168.0.8 through to $192.168.0.10

    Next is a similar story, but this time the routine simply checks whether the variable is set. If it isn't set, then the number is echoed.