I have custom IComparer<string>
which I use to compare strings ignoring their case and symbols like this:
public class LiberalStringComparer : IComparer<string>
{
private readonly CompareInfo _compareInfo = CultureInfo.InvariantCulture.CompareInfo;
private const CompareOptions COMPARE_OPTIONS = CompareOptions.IgnoreSymbols | CompareOptions.OrdinalIgnoreCase;
public int Compare(string x, string y)
{
if (x == null) return -1;
if (y == null) return 1;
return this._compareInfo.Compare(x, y, COMPARE_OPTIONS);
}
}
Can I obtain the output string which is, ultimately, used for the comparison?
My final goal is to produce an IEqualityComparer<string>
which ignores symbols and casing in the same way as this comparer.
I can write regex to do this, but there's no guarantee that my regex will use the same logic as the built-in comparison options do.
There is probably not such an "output string". I'd implement your Equals
in this way:
return liberalStringComparer.Compare(x, y) == 0;
GetHashCode
is more complicated.
Some approaches:
return 0;
(which means you always have to run a Compare
to know if they're equal).Since your comparison is relatively simple (invariant culture, ordinal ignore case comparison), you should be able to make a hash that generally works. Without extensive study of Unicode and testing, however, I wouldn't recommend that you assume this'll work for any valid Unicode string from any culture.
In pseudocode:
public int GetHashCode(string value)
{
// for each index in value
if (!char.IsSymbol(value, i))
// add value[i].ToUpperInvariant() to the hash using an algorithm
// like http://stackoverflow.com/a/263416/781792
}
char.IsSymbol
is true
, then use StringComparer.InvariantCulture.GetHashCode
on it.CompareInfo.GetSortKey
's hash code should be a suitable value.
public int GetHashCode(string value)
{
return _compareInfo.GetSortKey(value, COMPARE_OPTIONS).GetHashCode();
}