Search code examples
for-loopbatch-filebinarycmdfindstr

Extract RegExp string from binary in windows batch


a little problem keeps bugging me for couple of days. I'm trying to extract a string I can define with regexp from a *.exe binary, text like "1.01.01.00T123" into a environment variable for further use.
I've found the string with

findstr /i [0-9]\.[0-9][0-9]\.[0-9][0-9]\.[0-9][0-9][T][0-9][0-9][0-9] name.exe>outp.bin

now it's my string in a little bit smaller binary, maybe 200bytes. Then I was trying to use an output of findstr in a "for /f", but what delimiter should I use for the binary, nothing is guaranteed. Even dots and blanks can come and go.
Something like:

for /f "tokens=1,2,3,4* delims=^." %%a in ('findstr /i [0-9]\.[0-9][0-9]\.[0-9][0-9]\.[0-9][0-9]T[0-9][0-9][0-9] name.exe') do (
echo %%a
echo %%b
echo %%c
echo %%d
)

It works only halfway - the first is too long, an the last part "xxTxxx" isn't a token by this definition. Besides a dot can also happen inside of a binary, and not only in my string.
I thought of something like shorten the outp.bin in loop by cutting always the first byte and then check if my string is at the start of the outp.bin. But still haven't found way to do this. Is it possible?
Is there any way, that's less complex to just copy my regex result into a variable?
I hope missed some magic command for the regexp in a standard command shell.


Solution

  • It is nearly impossible to do what you want with pure batch because your binary may include nul bytes and batch cannot process null bytes. But the problem can easily be solved using VBS or JScript and regular expressions.

    Here is a very crude VBS solution, with lots of room for improvement. But it works.

    findStr.vbs

    Set myRegExp = New RegExp
    myRegExp.IgnoreCase = True
    myRegExp.Global = True
    myRegExp.Pattern = "\d\.\d\d\.\d\d\.\d\dT\d\d\d"
    Set matches = myRegExp.Execute(WScript.StdIn.ReadAll())
    For Each match In matches
      WScript.StdOut.WriteLine(match.value)
    Next
    

    Call the script with CSCRIPT and redirect input to your exe file.

    <name.exe cscript //nologo findStr.vbs
    

    You could use batch to process the results via FOR /F.

    for /f "delims=" %%A in ('^<name.exe cscript //nologo findStr.vbs') do echo %%A
    


    UPDATE - 2015-08-26

    You could easily solve this with JREPL.BAT - a pure script based regex processing utlity (hybrid JScript/batch) that runs natively on any Windows machine from XP onward. Full documentation is embedded within the script.

    The following simply lists the value(s) found in the file. Note that the /M option is required because of possible null bytes in the exe.

    call jrepl "\d\.\d\d\.\d\d\.\d\dT\d\d\d" $0 /jmatch /m /f name.exe
    

    To capture the value in a variable (or the last value if there are multiple occurrences):

    for %%A in (
      'jrepl "\d\.\d\d\.\d\d\.\d\dT\d\d\d" $0 /jmatch /m /f name.exe'
    ) do set "str=%%A"