Search code examples
c#.netstringicomparer

Can I obtain the result string used for comparisons with CompareOptions?


I have custom IComparer<string> which I use to compare strings ignoring their case and symbols like this:

public class LiberalStringComparer : IComparer<string>
{
    private readonly CompareInfo _compareInfo = CultureInfo.InvariantCulture.CompareInfo;
    private const CompareOptions COMPARE_OPTIONS = CompareOptions.IgnoreSymbols | CompareOptions.OrdinalIgnoreCase;

    public int Compare(string x, string y)
    {
        if (x == null) return -1;
        if (y == null) return 1;

        return this._compareInfo.Compare(x, y, COMPARE_OPTIONS);
    }
}

Can I obtain the output string which is, ultimately, used for the comparison?

My final goal is to produce an IEqualityComparer<string> which ignores symbols and casing in the same way as this comparer.

I can write regex to do this, but there's no guarantee that my regex will use the same logic as the built-in comparison options do.


Solution

  • There is probably not such an "output string". I'd implement your Equals in this way:

    return liberalStringComparer.Compare(x, y) == 0;
    

    GetHashCode is more complicated.

    Some approaches:

    1. Use a poor implementation like return 0; (which means you always have to run a Compare to know if they're equal).
    2. Since your comparison is relatively simple (invariant culture, ordinal ignore case comparison), you should be able to make a hash that generally works. Without extensive study of Unicode and testing, however, I wouldn't recommend that you assume this'll work for any valid Unicode string from any culture.

      In pseudocode:

      public int GetHashCode(string value)
      {
          // for each index in value
          if (!char.IsSymbol(value, i))
              // add value[i].ToUpperInvariant() to the hash using an algorithm
              // like http://stackoverflow.com/a/263416/781792
      }
      
    3. Form a string by removing all where char.IsSymbol is true, then use StringComparer.InvariantCulture.GetHashCode on it.
    4. CompareInfo.GetSortKey's hash code should be a suitable value.

      public int GetHashCode(string value)
      {
          return _compareInfo.GetSortKey(value, COMPARE_OPTIONS).GetHashCode();
      }