Search code examples
c#.netwindowsstringstring-comparison

Why is "ss" equal to the German sharp-s character 'ß'?


Coming from this question I'm wondering why ä and ae are different(which makes sense) but ß and ss are treated as equal. I haven't found an answer on SO even if this question seems to be related and even mentions "that ß will compare equal to SS in Germany, or similar" but not why.

The only resource on MSDN I found was this: How to: Compare Strings

Here is mentioned following but also lacks the why:

// "They dance in the street." 
// Linguistically (in Windows), "ss" is equal to 
// the German essetz: 'ß' character in both en-US and de-DE cultures. 
.....

So why does this evaluate to true, both with de-DE culture or any other culture:

var ci = new CultureInfo("de-DE");
int result = ci.CompareInfo.Compare("strasse", "straße", CompareOptions.IgnoreNonSpace); // 0
bool equals = String.Equals("strasse", "straße", StringComparison.CurrentCulture); // true
equals = String.Equals("strasse", "straße", StringComparison.InvariantCulture);  // true

Solution

  • If you look at the Ä page, you'll see that not always Ä is a replacement for Æ (or ae), and it is still used in various languages.

    The letter ß instead:

    While the letter "ß" has been used in other languages, it is now only used in German. However, it is not used in Switzerland, Liechtenstein or Namibia.[1] German speakers in Germany, Austria, Belgium,[2] Denmark,[3] Luxembourg[4] and South Tyrol, Italy[5] follow the standard rules for ß.

    So the ß is used in a single language, with a single rule (ß == ss), while the Ä is used in multiple languages with multiple rules.

    Note that, considering that case folding is:

    Case folding is primarily used for caseless comparison of text, such as identifiers in a computer program, rather than actual text transformation

    The official Unicode 7.0 Case Folding Properties tells us that

    00DF; F; 0073 0073; # LATIN SMALL LETTER SHARP S

    where 00DF is ß and 0073 is s, so ß can be considered, for caseless comparison, as ss.