When I was converting a program from Java to C#, I noticed that the string comparison seems to behave differently in some cases:
In Java
System.out.println("_".compareTo("0")); // prints 47, "_" is larger than "0"
In C#
Console.WriteLine("_".CompareTo("0")); // prints -1, "_" is smaller than "0"
Does anyone know why C# thinks "_" (underscore) is smaller than "0", while Java does the opposite thing (which makes more sense to me because it matches ASCII order)?
Updates
Thanks guys for pointing out the ordinal comparison.
Console.WriteLine(String.CompareOrdinal("_","0")); // prints 47, ordinal comparison does return the result reflecting ASCII
I checked the docs and realized CompareTo
is "culture-sensitive and case-sensitive comparison" and is NOT "ordinal" (https://learn.microsoft.com/en-us/dotnet/api/system.string.compareto?view=net-5.0#remarks). That might be what makes a difference.
Java does the same as the 'ordinal' comparision in c#.
The java documentation says 'It compares strings on the basis of Unicode value of each character in the strings.'
For non-surrogate unicode characters this is the same as string.CompareOrdinal
in c#, which "Compares two String objects by evaluating the numeric values of the corresponding Char objects in each string."
I'm not sure if they're the same for high unicode codepoints (surrogate pairs) or not, but I suspect they might be, since both Java and c# uses 16 bits char types.
The 'standard' c# string.Compare
, on the other hand, performs 'culture-sensitive' comparision. Which means that it 'uses the current culture to obtain culture-specific information such as casing rules and the alphabetic order of individual characters'.
You can read more about this at the documentation for System.Globalization.CompareOptions
.