Search code examples
vbavbscriptqtphp-uft

Unexpected result of StrComp when using vbTextCompare


The goal

Compare two strings lexicographically, ignoring case.

Possible solutions using StrComp

Consider the following script:

val1 = "test9999"
val2 = "TEST_59895"

LexCompare val1, val2, vbBinaryCompare
LexCompare LCase(val1), LCase(val2), vbBinaryCompare
LexCompare UCase(val1), UCase(val2), vbBinaryCompare

LexCompare val1, val2, vbTextCompare
LexCompare LCase(val1), LCase(val2), vbTextCompare
LexCompare UCase(val1), UCase(val2), vbTextCompare

WScript.Echo "ANSI values: '9'=" & Asc("9") & ", '_'=" & Asc("_")

Sub LexCompare(string1, string2, compareType)
    result = ""
    Select Case StrComp(string1, string2, compareType)
        Case -1
            result = "is smaller than"
        Case 0
            result = "is identical to"
        Case 1
            result = "is greater than"
    End Select
    WScript.Echo "'" & string1 & "' " & result & " '" & string2 & "', compareType: " & compareType
End Sub

Output:

'test9999' is greater than 'TEST_59895', compareType: 0
'test9999' is smaller than 'test_59895', compareType: 0
'TEST9999' is smaller than 'TEST_59895', compareType: 0
'test9999' is greater than 'TEST_59895', compareType: 1
'test9999' is greater than 'test_59895', compareType: 1
'TEST9999' is greater than 'TEST_59895', compareType: 1
ANSI values: '9'=57, '_'=95

To me, "test9999" should be lexicographically smaller than "TEST_59895", ignoring case. Why? Because '9' is smaller than '_'.

Questions

  • What am I missing?
  • I understand the results when using vbBinaryCompare and will use either LCase-ing or UCase-ing both variables as a workaround.
  • But why doesn't StrComp come to the same conclusion using vbTextCompare? I thought the very definition of vbTextCompare was to compare ignoring case?

Solution

  • The effect of vbTextCompare is applying the rules of Option Compare Text to just that one comparison.

    You can see in the documentation that Option Compare Text does not rely on ANSI values, rather it relies on

    case-insensitive text sort order determined by your system's locale.

    Your locale can dictate any sort order, so it happens to sort digits after underscores.