Search code examples
batch-file

Dash it! A batch string-comparison conundrum


Conventional wisdom holds that

if %a% lss %b% echo %a% is less than %b%

will display the message if

  • %a% is less than %b% AND both %a% and %b% can be resolved to an integer

OR if

  • the string %a% is "less" than the string %b% according to the collating sequence, implicitly ASCII (/i being a complication in this case)

and that

if "%a%" lss "%b%" echo "%a%" is less than "%b%"

does the same, bar the integer condition, since neither "%a%" nor "%b%" can be resolved to an integer because of the quotes.

Hence, consider the following:

@echo off
setlocal
FOR %%e IN (2 -4 6 -9 123) DO FOR %%o IN (3 -3 7 -7 -164) DO (
 IF "%%e" lss "%%o" (ECHO "%%e" lss "%%o")
 IF "%%e" equ "%%o" (ECHO "%%e" equ "%%o")
 IF "%%e" gtr "%%o" (ECHO "%%e" gtr "%%o")
)

which yields:

"2" lss "3"
"2" lss "-3"
"2" lss "7"
"2" lss "-7"
"2" gtr "-164"
"-4" gtr "3"
"-4" gtr "-3"
"-4" lss "7"
"-4" lss "-7"
"-4" gtr "-164"
"6" gtr "3"
"6" gtr "-3"
"6" lss "7"
"6" lss "-7"
"6" gtr "-164"
"-9" gtr "3"
"-9" gtr "-3"
"-9" gtr "7"
"-9" gtr "-7"
"-9" gtr "-164"
"123" lss "3"
"123" lss "-3"
"123" lss "7"
"123" lss "-7"
"123" lss "-164"

Which indicates that the leading - is ignored and the arguments are evaluated as strings.

Why?


Solution

  • There is explained in the answer Symbol equivalent to NEQ, LSS, GTR, etc. in Windows batch files that if using the comparison operators EQU, NEQ, LSS, LEQ, GTR, GEQ and the conversion of at least one of the two strings to a signed 32-bit integer fails as with " at the beginning, the Windows Command Processor cmd.exe compares the two strings using the function lstrcmpW or the function lstrcmpiW on using additionally the option /I for a case-insensitive string comparison.

    The function lstrcmpW returns a negative integer if the value of a character of first string is less than the value of the corresponding character of second string. The return value of function lstrcmpW is 0 if the two strings are equal. A positive integer is returned by lstrcmpW if the value of a character in second string is greater than the value of the corresponding character of first string.

    cmd.exe evaluates with usage of the comparison operators EQU, NEQ, LSS, LEQ, GTR, GEQ the integer return value of function lstrcmpW as follows:

    1. EQU is true if lstrcmpW returns 0.
      Example: if "2" EQU "2" echo Strings are equal.
    2. NEQ is true if lstrcmpW returns a negative or a positive integer.
      Example: if "+2" NEQ "2" echo Strings are not equal.
    3. LSS is true if lstrcmpW returns a negative integer.
      Example: if "+2" LSS "2" echo First string is less than second string.
    4. LEQ is true if lstrcmpW returns a negative integer or 0.
      Example: if "+2" LEQ "+2" echo First string is equal or less than second string.
    5. GTR is true if lstrcmpW returns a positive integer.
      Example: if "2" GTR "+2" echo Second string is greater than first string.
    6. GEQ is true if lstrcmpW returns a positive integer or 0.
      Example: if "+2" GEQ "+2" echo Second string is equal or greater than first string.

    There can be already seen on the examples that using the integer comparison operators EQU, NEQ, LSS, LEQ, GTR, GEQ is in general no good idea on passing double quoted strings to command IF.

    The function lstrcmpW compares the two strings character by character. A sign character like + on one string results already in often not expected IF behavior.

    But there must be one more fact considered as written in remarks section of the Microsoft documentation of function lstrcmpW.

    The lstrcmp function uses a word sort, rather than a string sort. A word sort treats hyphens and apostrophes differently than it treats other symbols that are not alphanumeric, in order to ensure that words such as "coop" and "co-op" stay together within a sorted list. For a detailed discussion of word sorts and string sorts, see Handling Sorting in Your Applications.

    The lstrcmp function and its wide character variant lstrcmpW call the CompareStringEx function for doing the string comparison. This function is often used for sorting strings like words. There can be chosen with the CompareStringEx function parameter dwCmpFlags between three (since Windows 7) respectively two (prior Windows 7) sort options:

    1. The default word-based sort option. In this case all hyphens - and all apostrophes ' are ignored first on comparing the two strings. If the current character of first or second string is - or ', the character of the appropriate string is ignored resulting in moving in character array on to the next character of the string. There are compared first only each non-hyphen and non-apostrophe character of the first string with each non-hyphen and non-apostrophe of second string.
    2. A strictly string based sort on using the flag SORT_STRINGSORT with really comparing one character after the other of first string with one character after the other of second string.
    3. A special number sort on using the flag SORT_DIGITSASNUMBERS available since Windows 7 for interpreting "2" as less than "10" because of the number of digits is considered on comparing two strings both representing decimal integer numbers.

    The Windows Command Processor does not use the function CompareStringEx directly. There is used by cmd.exe the function lstrcmpW which calls CompareStringEx always with value 0 for the function parameter dwCmpFlags resulting in a word-based sort with ignoring first hyphens and apostrophes on comparing the two strings.

    The comparison of number strings enclosed in " with hyphen character - as minus sign at the beginning of one or both strings is useless for that reason.

    Examples:

    if "2" LSS "-3" echo String "2" is less than string "-3".
    if "-2" LSS "-3" echo String "-2" is less than string "-3".
    if "---2" LSS "---3" echo String "---2" is less than string "---3".
    if "'2'" LSS "-3" echo String "'2'" is less than string "-3".
    

    In all four examples are compared in real first just "2" with "3" or as wide characters arrays 0x0022 0x0032 0x0022 0x0000 with 0x0022 0x0033 0x0022 0x0000.

    But does the number of characters never matter?

    No! The number of characters matters on both strings are equal on ignoring all hyphens and apostrophes as it can be seen on the following examples:

    if "-2" LSS "--2" echo String "-2" is less than string "--2".
    if "--2" GTR "-2" echo String "--2" is greater than string "-2".
    if "-2-" GTR "-2" echo String "-2-" is greater than string "-2".
    if "--2" EQU "--2" echo String "--2" is equal string "--2".
    if "2" LSS "'2" echo String "2" is less than string "'2".
    if "-2-" LSS "--2" echo String "-2-" is less than string "--2".
    if "--2" GTR "-2-" echo String "--2" is greater than string "-2-".
    

    That is important for a word sort. A string like coop with just four characters should be in a sorted lists of words before a string like co-op with five characters although the strings are without the hyphen identical.

    The last two examples are interesting as they demonstrate that a string with first non-hyphen and non-apostrophe character at a lower character index position in string is favored on sort towards the string with same non-hyphen and non-apostrophe characters, but the first non-hyphen and non-apostrophe character is at a higher character index position.

    The ignored characters also matter for the string comparison result if the two compared strings have identical characters with ignoring all hyphens and apostrophes but are different on considering them.

    if "-2" GTR "'2" echo String "-2" is greater than string "'2".
    if "''2" LSS "-2" echo String "''2" is less than string "-2".
    if "'2'" LSS "-2" echo String "'2'" is less than string "-2".
    if "'2" LSS "-2-" echo String "'2" is less than string "-2-".
    if "-2'" GTR "-2" echo String "-2'" is greater than string "-2".
    

    The apostrophe ' has the code value 0x0027 while the hyphen - has the greater code value 0x002D. If the two strings are without hyphens and characters identical but are different with considering them, the comparison of a hyphen with an apostrophe results in the string with the hyphen being interpreted as greater than the string with the apostrophe on same character position in the two compared strings.

    Conclusion: Comparing two number strings with surrounding " with using the integer comparison operators EQU, NEQ, LSS, LEQ, GTR, GEQ works only if both strings represent a decimal number without plus or minus sign or other characters than 0123456789 and with identical number of digits by prepending 0 on the number string with less digits than the other number string until both number strings have the same number of characters (digits) before running the comparison.