Search code examples
stringbatch-fileexpressionfindstr

Using findstr with regular expressions to search sections of a string


I have created code which loops through each line of files.txt and identifies if the file is car, boat or neither and stores this information into SendType.txt. This is accomplished by using the findstr command and searching through the list of boat.txt and car.txt. Essentially the way the code is designed, it will take the first line of files.txt and see if it can find the string in car.txt (if so car is output to SendType.txt) if not it will search for the string in boat.txt (if the string is found in boat.txt, boat is output to SendType.txt), and if the string is not found in either car.txt or boat.txt, then the text neither is output to SendType.txt

Right now the code searches for the entire string line. Each string is similar to 11111_2222-22_2010-09-09_10-24-20.zip (11111=ID, 2222-22=model, 2010-09-09=date, 10-24-20=transaction ID).

I would like to replace my current findstr line, which searches the entire line, to search for the entire string EXCEPT the date portion. I have attached my code below for reference along with some examples of the files being input. Thanks in advance!

@echo off
FOR /F %%a in (files.txt) do (
findstr %%a car.txt
if errorlevel 1 (
findstr %%a boat.txt
    if errorlevel 1 (
    echo neither >>SendType.txt
) else (
    echo boat >>SendType.txt
)
) else (
    echo car >>SendType.txt
)
)

car.txt
11111_2222-22_2010-09-09_10-24-20.zip
11112_2222-11_2011-11-09_10-24-25.zip

boat.txt
11122_1111-22_2012-04-09_11-29-56.zip
11144_3333-11_2011-12-22_06-29-66.zip

files.txt
11122_1111-22_2000-01-01_11-29-56.zip
11144_3333-11_2000-01-01_06-29-66.zip
11155_1212-12_2000-01-01_11-19-69.zip
11111_2222-22_2000-01-01_10-24-20.zip
11112_2222-11_2000-01-01_10-24-25.zip
12345_2233-12_2000-01-01_07-27-44.zip

DESIRED OUTPUT:

SendType.txt
boat
boat
neither
car
car
neither

UPDATE 10/15 3:00 PM The current approach using dbenham's code and the parsing technique is as follows:

@echo off    
>SendType.txt (  
for /f "tokens=1,2,3,4 delims=_" %%a in (filenames.txt) do (    
findstr /c:"%%a_%%b_%%d" sneakernet.txt >nul && (echo sneakernet) || (      
findstr /c:"%%a_%%b_%%d" devmed.txt >nul && (echo devmed) || echo tanto   
       ) 
    )
)

Solution

  • If the format of the IDs is fixed with two _ that precede the date, then the solution is easy: simply use FOR /F to parse the values.

    I like to use && and || instead of testing ERRORLEVEL. Also, you don't need the output of FINDSTR, so you can redirect to nul. You should verify that the string matches from the begining of the line. Finally, you only need to redirect once, so you can overwrite instead of append - easier to test repeatedly because no need to delete before start.

    @echo off
    >SendType.txt (
      for /f "tokens=1,2 delims=_" %%a in (files.txt) do (
        findstr /bc:"%%a_%%b" car.txt >nul && (echo car) || (
          findstr /bc:"%%a_%%b" boat.txt >nul && (echo boat) || echo neither
        )
      )
    )
    


    If the format (and possibly length) of the ID can vary, but the format (length) of the date portion is constant, then you can use a substring:

    @echo off
    setlocal enableDelayedExpansion
    >SendType.txt (
      for /f "delims=" %%a in (files.txt) do (
        set "ln=%%a"
        findstr /bc:"!ln:~0,-23!" car.txt >nul && (echo car) || (
          findstr /bc:"!ln:~0,-23!" boat.txt >nul && (echo boat) || echo neither
        )
      )
    )
    


    If neither the ID nor date format is constant, then I would change the content of boat.txt and car.txt by stripping off the date portion. Then you can use the FINDSTR /G option. The /I option is needed because of a bug in FINDSTR.

    car.txt
    11111_2222-22
    11112_2222-11
    

    boat.txt
    11122_1111-22
    11144_3333-11
    

    @echo off
    setlocal enableDelayedExpansion
    >SendType.txt (
      for /f "delims=" %%a in (files.txt) do (
        echo %%a|findstr /blig:car.txt >nul && (echo car) || (
          echo %%a|findstr /blig:boat.txt >nul && (echo boat) || echo neither
        )
      )
    )
    

    Updated answer

    Now that I understand the requirements, this should solve the problem. I've basically used a variation of the 1st code in my original answer, and I've switched to using a regular expression. The test could be made more stringent by substituting [0-9] for each . in the regex.

    @echo off
    >SendType.txt (
      for /f "tokens=1,2,4 delims=_" %%a in (files.txt) do (
        findstr /brc:"%%a_%%b_....-..-.._%%c" car.txt >nul && (echo car) || (
          findstr /brc:"%%a_%%b_....-..-.._%%c" boat.txt >nul && (echo boat) || echo neither
        )
      )
    )