I am storing large number of arrays of data into a List, however, I don't want to store the data if it already exists in my list - the order of the data doesn't matter. I figured using GetHashCode to generate a hashcode would be appropriate because it was supposed to not care about order. However, what I found with a simple test below is that for the first two string[] a1 and a2 it generates a different hashcode.
Can I not utilize this method of checking? Can someone suggest a better way to check please?
string[] a1 = { "cat", "bird", "dog" };
string[] a2 = { "cat", "dog", "bird" };
string[] a3 = { "cat", "fish", "dog" };
Console.WriteLine(a1.GetHashCode());
Console.WriteLine(a2.GetHashCode());
Console.WriteLine(a3.GetHashCode());
the results from the above test produces three different hashcode results.
Ideally, I would have liked to see the same Hashcode for a1 and a2...so I am looking for something that would allow me to quickly check if those strings already exist.
Your arrays aren't equal, by the standard used by arrays for determining equality. The standard used by arrays for determining equality is that two separately created arrays are never equal.
If you want separately created collections with equal elements to compare as equal, then use a collection type which supports that.
I recommend HashSet<T>
, in your case HashSet<string>
. It doesn't provide the GetHashCode()
and Equals()
behaviour you want directly, but it has a CreateSetComparer()
method that provides you with a helper class that does give you hash code and comparer methods that do what you want.
Just remember that you cannot use this for a quick equality check. You can only use this for a quick inequality check. Two objects that are not equal may still have the same hash code, basically by random chance. It's only when the hash codes aren't equal that you can skip the equality check.