Search code examples
c#dictionary.net-coregethashcode

GetHashCode return the same int value for different objects


I'm trying to import data from a CSV file, unfortunately there is no primary key that would allow me to uniquely identify a given row. So I created a dictionary in which the key is the value that GetHashCode returns to me. I use the dictionary because its search is much faster than searching with linq and where with conditions for several properties.

My GetHashCode override looks like this:

    public override int GetHashCode()
    {
        unchecked
        {
            int hash = 17;
            hash = hash * 23 + this.Id.GetHashCode();
            hash = hash * 23 + this.Author?.GetHashCode() ?? 0.GetHashCode();
            hash = hash * 23 + this.Activity?.GetHashCode() ?? 0.GetHashCode();
            hash = hash * 23 + this.DateTime?.GetHashCode() ?? 0.GetHashCode();
            return hash;
        }
    }

After fetching data from DB I do:

.ToDictionary(d => d.GetHashCode());

And here comes the problem, I checked the database and I don't have any duplicates when it comes to these four parameters. But when running the import I often get an error that the given key already exists in the dictionary, but if I run the import again for the same data the next time everything runs fine.

How can I fix this error? The import application is written in .net 5

Id - long

Author, Activity - string

DateTime - DateTime?

Unfortunately, this ID is more like FK is not unique, there may be many rows with the same id, author, activity, but e.g. a different datetime


Solution

  • GetHashCode() does NOT produce unique values, so using it as a key in a dictionary can give you the errors that you have observed.

    You should implement GetHashCode() AND IEquatable<T> for your key type. Then you will be able to safely put instances of it into a hashing container, so long as there are no duplicate entries. (Items x and y will only be considered duplicates if the GetHashCode() values are the same AND x.Equals(y) returns true).

    So for example, your data key class could look like this:

    public sealed class DataKey : IEquatable<DataKey>
    {
        public long      Id       { get; }
        public string?   Author   { get; }
        public string?   Activity { get; }
        public DateTime? DateTime { get; }
    
        public DataKey(long id, string? author, string? activity, DateTime? dateTime)
        {
            Id       = id;
            Author   = author;
            Activity = activity;
            DateTime = dateTime;
        }
    
        public bool Equals(DataKey? other)
        {
            if (other is null)
                return false;
    
            if (ReferenceEquals(this, other))
                return true;
    
            return Id == other.Id && Author == other.Author && Activity == other.Activity && Nullable.Equals(DateTime, other.DateTime);
        }
    
        public override bool Equals(object? obj)
        {
            return ReferenceEquals(this, obj) || obj is DataKey other && Equals(other);
        }
    
        public override int GetHashCode()
        {
            unchecked
            {
                var hashCode = Id.GetHashCode();
                hashCode = (hashCode * 397) ^ (Author?.GetHashCode() ?? 0);
                hashCode = (hashCode * 397) ^ (Activity?.GetHashCode() ?? 0);
                hashCode = (hashCode * 397) ^ (DateTime?.GetHashCode() ?? 0);
                return hashCode;
            }
        }
    }
    

    That's a lot of boilerplate code. Fortunately, if you are using a fairly recent version of C#/.NET you can use the record type to simplify this to just:

     public sealed record DataKey(
         long      Id,
         string?   Author,
         string?   Activity,
         DateTime? DateTime);
    

    The record type implements IEquatable<T> and GetHashCode() correctly for you (for the specific types long, string? and DateTime?).

    Note that both the example types above are immutable. It's very important when using hashing containers that the properties of a key that contribute to GetHashCode() and Equals() are immutable. If you put an item in a hashing container and then change any of those properties, nasty things happen.