Assume there is this class:
public class Foo
{
public int Id { get; set; }
public int? NullableId { get; set; }
public Foo(int id, int? nullableId)
{
Id = id;
NullableId = nullableId;
}
}
I need to compare these objects by following rules:
To achieve it I have overwritten Equals and GetHashCode like this:
public override bool Equals(object obj)
{
var otherFoo = (Foo)obj;
var equalityCondition = Id == otherFoo.Id;
if (NullableId.HasValue && otherFoo.NullableId.HasValue)
equalityCondition &= (NullableId== otherFoo.NullableId);
return equalityCondition;
}
public override int GetHashCode()
{
var hashCode = 806340729;
hashCode = hashCode * -1521134295 + Id.GetHashCode();
return hashCode;
}
Further down I have two lists of Foo:
var first = new List<Foo> { new Foo(1, null) };
var second = new List<Foo> { new Foo(1, 1), new Foo(1, 2), new Foo(1, 3) };
Next, I want to join these lists. If I do it like this:
var result = second.Join(first, s => s, f => f, (f, s) => new {f, s}).ToList();
then the result would be as I expected and I will get 3 items. But, if I change order and join first with second:
var result = first.Join(second, f => f, s => s, (f, s) => new {f, s}).ToList();
then the result would only have 1 item - new Foo(1, null) and new Foo(1 ,3)
I can not get what am I doing wrong. If try to put a break point in Equals method then I can see that it tries to compare items from same list (e. g. compare new Foo(1, 1) and new Foo(1 ,2)). For me it looks like that happens because of Lookup that is being created inside Join method.
Could someone clarify what happens there? What should I change to achieve desired behavior?
Your Equals method is reflexive and symmetric, but it is not transitive.
Your implementation doesn't meet the requirements specified in the docs:
If (x.Equals(y) && y.Equals(z)) returns true, then x.Equals(z) returns true.
from https://learn.microsoft.com/en-us/dotnet/api/system.object.equals?view=netframework-4.8
For example, suppose you have:
var x = new Foo(1, 100);
var y = new Foo(1, null);
var z = new Foo(1, 200);
You have x.Equals(y)
and y.Equals(z)
which implies that you should also have x.Equals(z)
, but your implementation does not do this. Since you don't meet the specification, you can't expect any algorithms reliant on your Equals method to behave correctly.
You ask what you can do instead. This depends on exactly what you need to do. Part of the problem is that it's not really clear what is intended in the corner-cases, if indeed they can appear. What should happen if one Id
appears multiple times with the same NullableId
in one or both lists? For a simple example, if new Foo(1, 1)
exists in the first list three times, and the second list three times, what should be in the output? Nine items, one for each pairing?
Here's a naive attempt to solve your problem. This joins on only Id
and then filters out any pairings that have incompatible NullableId
. But you might not be expecting the duplicates when an Id
appears multiple times in each list, as can be seen in the example output.
using System;
using System.Linq;
using System.Collections.Generic;
public class Foo
{
public int Id { get; set; }
public int? NullableId { get; set; }
public Foo(int id, int? nullableId)
{
Id = id;
NullableId = nullableId;
}
public override string ToString() => $"Foo({Id}, {NullableId?.ToString()??"null"})";
}
class MainClass {
public static IEnumerable<Foo> JoinFoos(IEnumerable<Foo> first, IEnumerable<Foo> second) {
return first
.Join(second, f=>f.Id, s=>s.Id, (f,s) => new {f,s})
.Where(fs =>
fs.f.NullableId == null ||
fs.s.NullableId == null ||
fs.f.NullableId == fs.s.NullableId)
.Select(fs => new Foo(fs.f.Id, fs.f.NullableId ?? fs.s.NullableId));
}
public static void Main (string[] args) {
var first = new List<Foo> { new Foo(1, null), new Foo(1, null), new Foo(1, 3) };
var second = new List<Foo> { new Foo(1, 1), new Foo(1, 2), new Foo(1, 3), new Foo(1, null) };
foreach (var f in JoinFoos(first, second)) {
Console.WriteLine(f);
}
}
}
Output:
Foo(1, 1)
Foo(1, 2)
Foo(1, 3)
Foo(1, null)
Foo(1, 1)
Foo(1, 2)
Foo(1, 3)
Foo(1, null)
Foo(1, 3)
Foo(1, 3)
It also might be too slow for you if you have tens of thousands of items with the same Id
, because it builds up every possible pair with matching Id
before filtering them out. If each list has 10,000 items with Id == 1
then that's 100,000,000 pairs to pick through.