Search code examples
regexbatch-filefor-loopfindstr

Which regex method is best for validating user input? (for /f with delims vs. echo %var%|Findstr /ri)


I would like to validate a user's input and limit the input to alphanumeric characters only (underscores may be allowed as well), but i'm not sure which method is best for this.

I've seen various examples on SA and the first one that raises some questions for me is the following one:

:input
set "in="
set /p "in=Please enter your username: "

ECHO(%in%|FINDSTR /ri "^[0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ][0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ]*$" >nul || (

    goto input

)

I see a second case that's identical to the first one (with as expection, the leading ^ and ending *$).

Why is the extra case and ^ *$ needed when the following also works?:

:input
set "in="
set /p "in=Please enter your username: "

ECHO(%in%|FINDSTR /ri "[0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ]" >nul || (

    goto input

)

Finally, The FOR /F loop method i've noticed on here as well:

for /f "delims=1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ" %%a in ("%in%") do goto :input

Is there any (dis)advantage in using this over the beforementioned FINDSTR regex one?


Solution

  • For safely validating user input, both methods are reliable, but you must improve them:


    findstr method

    At first, let us focus on the search string like ^[...][...]*$ (where ... stands for a character class, meaning a set of characters): A character class [...] matches any one character from set ...; * means repetition, so matching zero or more occurrences, hence [...]* matches zero or more occurrences of characters from set ...; therefore, [...][...]* matches one or more occurrences of characters from set .... The leading ^ anchors the match to the beginning of the line, the trailing $ anchors it to the end; therefore, when both anchors are specified, the entire line must match the search string.

    Concerning character classes [...]: According to the thread What are the undocumented features and limitations of the Windows FINDSTR command?, classes are buggy; for instance, the class [A-Z] matches small letters b to z, and [a-z] matches capital letters A to Y (this does of course not matter in case a case-insensitive search is done, so when /I is given); the class [0-9] may match ² or ³, depending on the current code page; [A-Z] and [a-z] may match special letters like Á or á, for example, also depending on current code page. Hence to safely match certain characters only, do not use ranges, but specify each character individually, like [0123456789], [ABCDEFGHIJKLMNOPQRSTUVWXYZ] or [abcdefghijklmnopqrstuvwxyz].

    All this leads us to the following findstr command line:

    findstr /R /I "^[0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ][0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ]*$"
    

    Nevertheless, the whole approach with the piped echo might still fail, because special characters like ", &, ^, %, !, (, ), <, >, | could lead to syntax errors or other unintended behaviour. To avoid that, we need to establish delayed expansion, so the special characters become hidden from the command parser. However, since pipes (|) initialise new cmd instances for either side (which inherit the current environment), we need to ensure to do the actual variable expansion in the left child cmd instance rather than in the parent one, like this:

    :INPUT
    set "IN="
    set /P IN="Please enter your username: "
    
    cmd /V /C echo(^^!IN^^!| findstr /R /I "^[0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ][0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ]*$" > nul || goto :INPUT
    

    The extra explicit cmd instance is needed to enable delayed expansion (/V), because the instances initiated by the pipe have delayed expansion disabled.

    The doubled escaping of the exclamation marks ^^! is only needed in case delayed expansion is also enabled in the parent cmd instance; if not, single escaping ^! was sufficient, but doubled escaping does not harm.


    for /F method

    This approach makes life easier, because there is no pipe involved and so, you do not have to deal with multiple cmd instances, but there is still room for improvement. Again, special characters may cause trouble, so delayed expansion needs to be enabled.

    The for /F loop ignores empty lines and such beginning with the default eol character, the semicolon ;. To disable the eol option, simply define one of the delimiter characters, so eol becomes hidden behind delims. Empty lines are not iterated, so the goto command in your approach would never execute in case of empty user input. Therefore, we must capture empty user input explicitly, using an if statement. Now all this leads to the following code:

    setlocal EnableDelayedExpansion
    :INPUT
    set "IN="
    set /P IN="Please enter your username: "
    
    if not defined IN goto :INPUT
    for /F "delims=0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ eol=0" %%Z in ("!IN!") do goto :INPUT
    
    endlocal
    

    This approach detects capital letters only; to include small letters as well, you have to add them to the delims option: delims=0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz.

    Note that variable IN is no longer available beyond endlocal, but this should be the very last comand of your script anyway.

    To detect whether or not a for /F loop iterated or not, there is an undocumented feature, which we can make use of: for /F returns a non-zero exit code if it does not iterate, hence conditional execution operators && or || can be used; so, when the user input is empty, the loop does not iterate, then ||; for this to work, the for /F loop must be enclosed within parentheses:

    setlocal EnableDelayedExpansion
    :INPUT
    set "IN="
    set /P IN="Please enter your username: "
    
    if not defined IN goto :INPUT
    (for /F "delims=0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ eol=0" %%Z in ("!IN!") do rem/) && goto :INPUT
    
    endlocal