Search code examples
regexbatch-filebatch-processingfindstr

Trying to extract a GUID from a text, using batch (findstr + regexp)


I want to isolate a specific string from a text provided in a variable, using batch, but it doesn't seem to work as intended. I may do the regexp wrong, or maybe I misunderstood the way "findstr" works.

Te specific string that I need to isolate is a GUID (which has a standard format of alphanumeric characters, arranged in groups of characters separated by a "-", like this: 8-4-4-4-12)

@echo off
setlocal enabledelayedexpansion

SET str="This is a string that has a long uuid: (UUID: 359f975d-2649-4e20-b7c0-b452aaaca4b2)"
SET rx=[a-zA-Z0-9]{8}-[a-zA-Z0-9]{4}-[a-zA-Z0-9]{4}-[a-zA-Z0-9]{4}-[a-zA-Z0-9]{12}

 FOR %%u IN ('FINDSTR /r "!rx!" "!str!"') DO ECHO %%u

endlocal

Basically, what I need is to store the GUID in a separate variable, so I can use it later on. If that can be achieved in a different manner, I'm happy to learn!

Thanks!


Solution

  • @ECHO Off
    SETLOCAL
    SET "str=This is a string that has a long uuid: (UUID: 359f975d-2649-4e20-b7c0-b452aaaca4b2)"
    
    :: Theoretical
    
    SET "hn=[a-f0-9]"
    SET "hn4=%hn%%hn%%hn%%hn%"
    SET "hn8=%hn4%%hn4%"
    SET "wrx=%hn8%-%hn4%-%hn4%-%hn4%-%hn8%%hn4%"
    :again
    IF NOT DEFINED str ECHO notfound&GOTO done
    ECHO %str%|FINDSTR /b /r /i "%wrx%">NUL
    IF ERRORLEVEL 1 (
     REM did not find string
     SET "str=%str:~1%"
     GOTO again
    )
    SET "str=%str:~0,36%"
    ECHO found "%str%"
    
    :done
    
    :: BFI method
    
    SET "str=This is a string that has a long uuid: (UUID: 359f975d-2649-4e20-b7c0-b452aaaca4b2)"
    SET "hn=[a-f0-9]"
    SET "hn4=%hn%%hn%%hn%%hn%"
    SET "hn8=%hn4%%hn4%"
    
    :bfiagain
    IF NOT DEFINED str ECHO notfound&GOTO donebfi
    :: "regex" using brute-force and ignorance
    ECHO %str:~0,9%|FINDSTR /b /i /r  "%hn8%-">NUL
    IF ERRORLEVEL 1 GOTO bfino
    ECHO %str:~9,5%|FINDSTR /b /i /r  "%hn4%-">NUL
    IF ERRORLEVEL 1 GOTO bfino
    ECHO %str:~14,10%|FINDSTR /b /i /r  "%hn4%-%hn4%-">NUL
    IF ERRORLEVEL 1 GOTO bfino
    ECHO %str:~24,12%|FINDSTR /b /i /r  "%hn4%%hn8%">NUL
    :bfino
    IF ERRORLEVEL 1 (
     SET "str=%str:~1%"
     GOTO bfiagain
    )
    SET "str=%str:~0,36%"
    ECHO found "%str%"
    
    :donebfi
    
    GOTO :EOF
    

    Well, not so squeezy...

    Fundamentally, findstr implements a very small subset of regex. It's intended to locate a character-string in a file.

    Theoretically, you could string [a-f0-9] together the requisite number of times and add in the - separators for use as the "regex", then see whether the subject string /b (begins) with such a pattern; lop off the start character if not and repeat until found or subject-string is empty.

    Notes here: I believe GUID uses HEX digits only, not alphanumerics. findstr supports /i to have the comparison made case-insensitively (which shortens the individual "character-match" string). Yes - I know ^ can be used in a regex (even one from Uncle Bill's little programmers' toolset) but I prefer /b.

    The only small problem with this is that it yielded an out of memory error...

    So, feed it small chunks at a time, and it appears happy...

    I've done no further testing, and predict stormy weather if your text-string contains characters which cmd regards as specials - the usual suspects like redirectors, % and rabbit's ears.