I would like to validate a user's input and limit the input to alphanumeric characters only (underscores may be allowed as well), but i'm not sure which method is best for this.
I've seen various examples on SA and the first one that raises some questions for me is the following one:
:input
set "in="
set /p "in=Please enter your username: "
ECHO(%in%|FINDSTR /ri "^[0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ][0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ]*$" >nul || (
goto input
)
I see a second case that's identical to the first one (with as expection, the leading ^
and ending *$
).
Why is the extra case and ^
*$
needed when the following also works?:
:input
set "in="
set /p "in=Please enter your username: "
ECHO(%in%|FINDSTR /ri "[0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ]" >nul || (
goto input
)
Finally, The FOR /F
loop method i've noticed on here as well:
for /f "delims=1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ" %%a in ("%in%") do goto :input
Is there any (dis)advantage in using this over the beforementioned FINDSTR regex one?
For safely validating user input, both methods are reliable, but you must improve them:
findstr
methodAt first, let us focus on the search string like ^[...][...]*$
(where ...
stands for a character class, meaning a set of characters): A character class [...]
matches any one character from set ...
; *
means repetition, so matching zero or more occurrences, hence [...]*
matches zero or more occurrences of characters from set ...
; therefore, [...][...]*
matches one or more occurrences of characters from set ...
. The leading ^
anchors the match to the beginning of the line, the trailing $
anchors it to the end; therefore, when both anchors are specified, the entire line must match the search string.
Concerning character classes [...]
: According to the thread What are the undocumented features and limitations of the Windows FINDSTR command?, classes are buggy; for instance, the class [A-Z]
matches small letters b
to z
, and [a-z]
matches capital letters A
to Y
(this does of course not matter in case a case-insensitive search is done, so when /I
is given); the class [0-9]
may match ²
or ³
, depending on the current code page; [A-Z]
and [a-z]
may match special letters like Á
or á
, for example, also depending on current code page. Hence to safely match certain characters only, do not use ranges, but specify each character individually, like [0123456789]
, [ABCDEFGHIJKLMNOPQRSTUVWXYZ]
or [abcdefghijklmnopqrstuvwxyz]
.
All this leads us to the following findstr
command line:
findstr /R /I "^[0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ][0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ]*$"
Nevertheless, the whole approach with the piped echo
might still fail, because special characters like "
, &
, ^
, %
, !
, (
, )
, <
, >
, |
could lead to syntax errors or other unintended behaviour. To avoid that, we need to establish delayed expansion, so the special characters become hidden from the command parser. However, since pipes (|
) initialise new cmd
instances for either side (which inherit the current environment), we need to ensure to do the actual variable expansion in the left child cmd
instance rather than in the parent one, like this:
:INPUT
set "IN="
set /P IN="Please enter your username: "
cmd /V /C echo(^^!IN^^!| findstr /R /I "^[0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ][0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ]*$" > nul || goto :INPUT
The extra explicit cmd
instance is needed to enable delayed expansion (/V
), because the instances initiated by the pipe have delayed expansion disabled.
The doubled escaping of the exclamation marks ^^!
is only needed in case delayed expansion is also enabled in the parent cmd
instance; if not, single escaping ^!
was sufficient, but doubled escaping does not harm.
for /F
methodThis approach makes life easier, because there is no pipe involved and so, you do not have to deal with multiple cmd
instances, but there is still room for improvement. Again, special characters may cause trouble, so delayed expansion needs to be enabled.
The for /F
loop ignores empty lines and such beginning with the default eol
character, the semicolon ;
. To disable the eol
option, simply define one of the delimiter characters, so eol
becomes hidden behind delims
. Empty lines are not iterated, so the goto
command in your approach would never execute in case of empty user input. Therefore, we must capture empty user input explicitly, using an if
statement. Now all this leads to the following code:
setlocal EnableDelayedExpansion
:INPUT
set "IN="
set /P IN="Please enter your username: "
if not defined IN goto :INPUT
for /F "delims=0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ eol=0" %%Z in ("!IN!") do goto :INPUT
endlocal
This approach detects capital letters only; to include small letters as well, you have to add them to the delims
option: delims=0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
.
Note that variable IN
is no longer available beyond endlocal
, but this should be the very last comand of your script anyway.
To detect whether or not a for /F
loop iterated or not, there is an undocumented feature, which we can make use of: for /F
returns a non-zero exit code if it does not iterate, hence conditional execution operators &&
or ||
can be used; so, when the user input is empty, the loop does not iterate, then ||
; for this to work, the for /F
loop must be enclosed within parentheses:
setlocal EnableDelayedExpansion
:INPUT
set "IN="
set /P IN="Please enter your username: "
if not defined IN goto :INPUT
(for /F "delims=0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ eol=0" %%Z in ("!IN!") do rem/) && goto :INPUT
endlocal