Search code examples
powershellsortingwindows-10

Powershell sort command not working as expected


I used this set of commands to check the sort command on the keyboard characters.

$symb="a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z","A","B","C","D","E","F","G","H","I","J","K","L","M","N","O","P","Q","R","S","T","U","V","W","X","Y","Z","²","1","2","3","4","5","6","7","8","9","0","°","+","&","é",'"',"'","(","-","è",[regex]::escape('`'),"_","ç","à",")","=","~","#","{","[","|","\","^","@","]","}","$","¨","ˆ","£","¤","ù","*","%","µ","<",",",";",":","!",">","?",".","/","§","€"; $symb|sort|ac file.txt;(gc file.txt)-join""

Here is what I get, both in a file and on the console.

'-!"#$%&()*,./:;?@[\]ˆ^_`{|}~¨£¤€+<=>§°µ012²3456789aAàbBcCçDdEeéèfFgGhHIiJjKkLlmMNnOoPpqQRrsStTuUùvVwWXxyYzZ

In about half the cases of pairs of lower- and uppercase letters the order is inverted; it seems it should always be "lowercase first, uppercase next". How can that be fixed?


Solution

  • PowerShell - unlike direct use of .NET types - is case-insensitive by default; you need to opt in if you want case-sensitive behavior.

    In the case of Sort-Object you need to use its -CaseSensitive switch:

    PS> -join ('a', 'B', 'A', 'b' | Sort-Object -CaseSensitive)
    aAbB
    

    As you expected, this results in lowercase letters sorting first, because in the (US-English) collation order lowercase letters have lower sorting weight than uppercase ones - even though with respect to their Unicode code points the relationship is reversed (e.g., [int] [char] 'a' is 97, whereas [int] [char] 'A' is 65).

    (Code-point-based sorting would apply if the array contained [char] instances, but PowerShell has no [char] literals, so a literal such as 'a' is a [string] of length 1; you can use explicit casts, however: -join ([char] 'A', [char] 'a' | Sort-Object -CaseSensitive) yields 'Aa', i.e. sorts uppercase first.)


    Without -CaseSensitive, the lowercase and uppercase variants of a given letter have equal sorting weight, so no particular ordering among them is guaranteed.

    For instance, the following loop exits quickly:

    $prevResult = $null
    while ($true) { 
      
      # Get a shuffled array of lower- and uppercase letters.
      $arr = 'a', 'B', 'A', 'b'
      $arr = $arr | Get-Random -Count $arr.Count
      
      # Sort it case-INsensitively.
      $result = -join ($arr | Sort-Object)
    
      $result # output
    
      # See if the result is different from the previous one.
      # Note the use of -cne rather than just -ne:
      # -ce is the case-*sensitive* variant of -ne
      if ($prevResult -and $prevResult -cne $result) {
        Write-Warning "Output order has changed."
        break
      }
      $prevResult = $result
    
    } 
    

    However, for a given input array, the two PowerShell editions differ with respect to sort stability, i.e. whether the input order of elements that sort the same is preserved:

    • In Windows PowerShell, Sort-Object is invariably not stable.

    • In PowerShell (Core) 7, Sort-Object now has a -Stable switch to request stable sorting, but - as of v7.4.x - it appears that sorting is stable by default (and, conversely, -Stable:$false does not opt-out). That said, to be future-proof and for conceptual clarity, it is better to specify -Stable explicitly when needed.

    Here's a quick example that illustrates the difference:

    # PowerShell 7 (as of v7.4.x works the same even without -Stable)
    # -> 'AabBCc', i.e. input order was preserved.
    -join ('A', 'b', 'C', 'a', 'B', 'c' | Sort-Object -Stable)
    
    # Windows PowerShell (no -Stable switch)
    # -> !! 'aABbcC', i.e. the input order was *not* preserved.
    -join ('A', 'b', 'C', 'a', 'B', 'c' | Sort-Object)