I don't believe that SortedSet.IsSubsetOf
is working as expected. Given this code, shouldn't ss.IsSubsetOf(l)
return True?
I'm suspicious that the problem is in my CompareTo
function, but I can't see the problem.
I also added a separate loop thinking that the set had to be found sorted inside the other
IEnumerable . . . but that doesn't work either.
Can anyone explain why this behavior is happening?
List<Point> l = new();
SortedSet<Point> ss = new();
HashSet<Point> h = new();
for (int i = 0; i < 100; i++)
{
var p = new Point(i, i);
l.Add(p);
l.Add(p);
h.Add(p);
h.Add(p);
ss.Add(p);
ss.Add(p);
}
// additional loop to get a sorted set.
for (int i = 0; i < 100; i++)
{
l.Add(new Point(i, i));
}
Console.WriteLine(l.Count); // 300
Console.WriteLine(h.Count); // 100
Console.WriteLine(ss.Count); // 100
Console.WriteLine(h.IsSubsetOf(l)); // True (as expected)
Console.WriteLine(h.IsProperSubsetOf(l)); // False
// This is the line in question
Console.WriteLine(ss.IsSubsetOf(l)); // False ?????????
Console.WriteLine(ss.IsProperSubsetOf(l)); // False
readonly struct Point : IComparable<Point>, IEquatable<Point>
{
public Point(double x, double y)
{
X = x;
Y = y;
}
public double X { get; }
public double Y { get; }
public int CompareTo(Point other)
{
if (X == other.X)
{
return Y.CompareTo(other.Y);
}
else
{
return X.CompareTo(other.X);
}
}
public bool Equals(Point other)
{
return X == other.X && Y == other.Y;
}
public override int GetHashCode()
{
return HashCode.Combine(X, Y);
}
}
Yes, this is a bug in the SortedSet
.
As far as I understand the erroneous code is the one determining the size of the set (which seems to be a red-black tree) to later be used as part of some bit magic:
int originalLastIndex = Count;
While the index of the current element is determined with the following code:
internal virtual int InternalIndexOf(T item)
{
Node? current = root;
int count = 0;
while (current != null)
{
int order = comparer.Compare(item, current.Item);
if (order == 0)
{
return count;
}
current = order < 0 ? current.Left : current.Right;
count = order < 0 ? (2 * count + 1) : (2 * count + 2);
}
return -1;
}
Which later is used for marking bits representing the encountered index:
int intArrayLength = BitHelper.ToIntArrayLength(originalLastIndex);
Span<int> span = stackalloc int[StackAllocThreshold];
BitHelper bitHelper = intArrayLength <= StackAllocThreshold ?
new BitHelper(span.Slice(0, intArrayLength), clear: true) :
new BitHelper(new int[intArrayLength], clear: false);
// ...
foreach (T item in other)
{
int index = InternalIndexOf(item);
if (index >= 0)
{
if (!bitHelper.IsMarked(index))
{
// item hasn't been seen yet
bitHelper.MarkBit(index);
}
}
// ...
}
And BitHelper
implementation.
The problem is that for unbalanced case (or maybe the index math is not "suitable" for such usage) the index calculated by InternalIndexOf
"overflows" the capacity of BitHelper
resulting in misses for unbalanced duplicate items.
The minimal repro for me is the following:
SortedSet<int> ss = new();
HashSet<int> h = new();
var num = 18;
for (int i = 0; i < num; i++)
{
h.Add(i);
ss.Add(i);
}
var data = Enumerable.Range(0, num).Append(17);
var hashSetEquals = h.SetEquals(data); // True
var sortedSetEquals = ss.SetEquals(data); // False
Note that if you use the constructor for SortedSet
which results in balanced (based on the implementation) tree the code will work:
HashSet<int> h = new();
var num = 18;
for (int i = 0; i < num; i++)
{
h.Add(i);
}
SortedSet<int> ss = new(h); // will balance the tree
var data = Enumerable.Range(0, num).Append(17);
var hashSetEquals = h.SetEquals(data);
var sortedSetEquals = ss.SetEquals(data);
Demo @sharplab.io
The same problem arises for "balanced" when number of elements is divisible by 32 (bit size of int
used in calculations). Demo @ sharplab. And for some collection sizes 64+ I've checked.
P.S.
Somebody has reported the issue @github