Search code examples
c#stringcomparison

Default string comparison in C# disregards ASCII order of "_" and "0" and gives result opposite to Java


When I was converting a program from Java to C#, I noticed that the string comparison seems to behave differently in some cases:

In Java

System.out.println("_".compareTo("0")); // prints 47, "_" is larger than "0"

In C#

Console.WriteLine("_".CompareTo("0")); // prints -1, "_" is smaller than "0"

Does anyone know why C# thinks "_" (underscore) is smaller than "0", while Java does the opposite thing (which makes more sense to me because it matches ASCII order)?

Updates

Thanks guys for pointing out the ordinal comparison.

Console.WriteLine(String.CompareOrdinal("_","0")); // prints 47, ordinal comparison does return the result reflecting ASCII

I checked the docs and realized CompareTo is "culture-sensitive and case-sensitive comparison" and is NOT "ordinal" (https://learn.microsoft.com/en-us/dotnet/api/system.string.compareto?view=net-5.0#remarks). That might be what makes a difference.


Solution

  • Java does the same as the 'ordinal' comparision in c#.

    The java documentation says 'It compares strings on the basis of Unicode value of each character in the strings.'

    For non-surrogate unicode characters this is the same as string.CompareOrdinal in c#, which "Compares two String objects by evaluating the numeric values of the corresponding Char objects in each string."

    I'm not sure if they're the same for high unicode codepoints (surrogate pairs) or not, but I suspect they might be, since both Java and c# uses 16 bits char types.

    The 'standard' c# string.Compare, on the other hand, performs 'culture-sensitive' comparision. Which means that it 'uses the current culture to obtain culture-specific information such as casing rules and the alphabetic order of individual characters'. You can read more about this at the documentation for System.Globalization.CompareOptions.