Search code examples
windowsvb.netsortingrule

What is the right arrangement of these two names if we sort it


I have two folders with names CLP2_v6 and CLP_DE0_v7. When I tried to sort it using Icomparer the result is:

Using StrCmpLogicalW(Windows):
CLP_DE0_v7
CLP2_v6

I'm confusing right now it's because when I tried to sort that words using this Text Line Sorter gave me a different answer:

Using TextLineSorter:
CLP2_v6
CLP_DE0_v7

What I want to display on my program is like this:

Preferred:
CLP2_v6
CLP_DE0_v7

Here VB.net is my code:

Public Class StringCompare
Implements IComparer(Of String)

Declare Unicode Function StrCmpLogicalW Lib "shlwapi.dll" _
(ByVal s1 As String, ByVal s2 As String) As Int32

Public Function Compare(x As String, y As String) As Integer Implements _
        System.Collections.Generic.IComparer(Of String).Compare
    Return StrCmpLogicalW(x, y)
End Function
End Class

Sub Main()
Dim UsortedArray() As String = {
        "CLP_DE0_v7",
        "CLP2_v6"
      }

    Dim rc As New StringCompare()
    Console.WriteLine(vbLf & "Windows Sorting:")
    Array.Sort(UsortedArray, rc)

    Console.WriteLine()
    For Each dinosaur As String In UsortedArray
        Console.WriteLine(dinosaur)
    Next
End Sub

What is the right arrangement if we'll try to sort these two words? Does sorting have many rules to follow? or Sorting have many standards?


Solution

  • There are many different approaches to sorting depending on required results, context or available tools. Since you have .NET you're not limited to technology, but your own requirements.

    Consider theese situations:

    Culture specific

    'Array sorted by English culture   
    {"aa", "bb", "cc", "ch", "dd", "ee", "ff", "gg", "hh", "ii"}
    'Same array sorted by Czech culture   
    {"aa", "bb", "cc", "dd", "ee", "ff", "gg", "hh", "ch", "ii"}
    

    And have you ever heard of ě, ê, è, é ? :)

    Where do you put them? Before "e", after "e", after, "z"? That would depend on your culture and needs.

    Technology specific

    Let's say you have your ANSI string in array of bytes. Sortig by byte-value returns something different then sorting by char position in alphabet.

    User-needs specific

    Is "a" more than "A"? What in general? What in your specific need? Is directory named "9" more than directory named "10"? Sort it as string and you'll get {"10", "9"}, open it in windows explorer and you'll see {"9", "10}. Open it in Total Commander and you'll get {"10", "9"} again for the same directory.

    Conclusion

    You should define what you really need in your specific case. And find proper or easy way how to do it. In .NET your results will depenend on Threading.Thread.CurrentThread.CultureInfo or your own IComparer that you can provide to IList.Sort method or SortedList/SortedSet constructors.

    Risks

    You should be aware of different sorting under different culture info. For example creating and filling SortedList(Of String, Object) under "hu-HU" culture will cause weird exceptions in some cases after reading items under "cs-CZ" culture since the items would not be sorted as expected and binary search tree would be confused.