I'm trying to import data from a CSV file, unfortunately there is no primary key that would allow me to uniquely identify a given row. So I created a dictionary in which the key is the value that GetHashCode returns to me. I use the dictionary because its search is much faster than searching with linq and where with conditions for several properties.
My GetHashCode override looks like this:
public override int GetHashCode()
{
unchecked
{
int hash = 17;
hash = hash * 23 + this.Id.GetHashCode();
hash = hash * 23 + this.Author?.GetHashCode() ?? 0.GetHashCode();
hash = hash * 23 + this.Activity?.GetHashCode() ?? 0.GetHashCode();
hash = hash * 23 + this.DateTime?.GetHashCode() ?? 0.GetHashCode();
return hash;
}
}
After fetching data from DB I do:
.ToDictionary(d => d.GetHashCode());
And here comes the problem, I checked the database and I don't have any duplicates when it comes to these four parameters. But when running the import I often get an error that the given key already exists in the dictionary, but if I run the import again for the same data the next time everything runs fine.
How can I fix this error? The import application is written in .net 5
Id - long
Author, Activity - string
DateTime - DateTime?
Unfortunately, this ID is more like FK is not unique, there may be many rows with the same id, author, activity, but e.g. a different datetime
GetHashCode()
does NOT produce unique values, so using it as a key in a dictionary can give you the errors that you have observed.
You should implement GetHashCode()
AND IEquatable<T>
for your key type. Then you will be able to safely put instances of it into a hashing container, so long as there are no duplicate entries. (Items x
and y
will only be considered duplicates if the GetHashCode()
values are the same AND x.Equals(y)
returns true
).
So for example, your data key class could look like this:
public sealed class DataKey : IEquatable<DataKey>
{
public long Id { get; }
public string? Author { get; }
public string? Activity { get; }
public DateTime? DateTime { get; }
public DataKey(long id, string? author, string? activity, DateTime? dateTime)
{
Id = id;
Author = author;
Activity = activity;
DateTime = dateTime;
}
public bool Equals(DataKey? other)
{
if (other is null)
return false;
if (ReferenceEquals(this, other))
return true;
return Id == other.Id && Author == other.Author && Activity == other.Activity && Nullable.Equals(DateTime, other.DateTime);
}
public override bool Equals(object? obj)
{
return ReferenceEquals(this, obj) || obj is DataKey other && Equals(other);
}
public override int GetHashCode()
{
unchecked
{
var hashCode = Id.GetHashCode();
hashCode = (hashCode * 397) ^ (Author?.GetHashCode() ?? 0);
hashCode = (hashCode * 397) ^ (Activity?.GetHashCode() ?? 0);
hashCode = (hashCode * 397) ^ (DateTime?.GetHashCode() ?? 0);
return hashCode;
}
}
}
That's a lot of boilerplate code. Fortunately, if you are using a fairly recent version of C#/.NET you can use the record
type to simplify this to just:
public sealed record DataKey(
long Id,
string? Author,
string? Activity,
DateTime? DateTime);
The record
type implements IEquatable<T>
and GetHashCode()
correctly for you (for the specific types long
, string?
and DateTime?
).
Note that both the example types above are immutable. It's very important when using hashing containers that the properties of a key that contribute to GetHashCode()
and Equals()
are immutable. If you put an item in a hashing container and then change any of those properties, nasty things happen.