Search code examples
c#collectionssortedset

SortedSet.IsSubsetOf not working as expected


I don't believe that SortedSet.IsSubsetOf is working as expected. Given this code, shouldn't ss.IsSubsetOf(l) return True?

I'm suspicious that the problem is in my CompareTo function, but I can't see the problem.

I also added a separate loop thinking that the set had to be found sorted inside the other IEnumerable . . . but that doesn't work either.

Can anyone explain why this behavior is happening?

List<Point> l = new();
SortedSet<Point> ss = new();
HashSet<Point> h = new();

for (int i = 0; i < 100; i++)
{
    var p = new Point(i, i);
    l.Add(p);
    l.Add(p);
    h.Add(p);
    h.Add(p);
    ss.Add(p);
    ss.Add(p);
}

// additional loop to get a sorted set.
for (int i = 0; i < 100; i++)
{
    l.Add(new Point(i, i));
}

Console.WriteLine(l.Count); // 300
Console.WriteLine(h.Count); // 100
Console.WriteLine(ss.Count); // 100
Console.WriteLine(h.IsSubsetOf(l));  // True (as expected)
Console.WriteLine(h.IsProperSubsetOf(l)); // False

// This is the line in question
Console.WriteLine(ss.IsSubsetOf(l)); // False ?????????

Console.WriteLine(ss.IsProperSubsetOf(l)); // False

readonly struct Point : IComparable<Point>, IEquatable<Point>
{
    public Point(double x, double y)
    {
        X = x;
        Y = y;
    }

    public double X { get; }
    public double Y { get; }

    public int CompareTo(Point other)
    {
        if (X == other.X)
        {
            return Y.CompareTo(other.Y);
        }
        else
        {
            return X.CompareTo(other.X);
        }
    }

    public bool Equals(Point other)
    {
        return X == other.X && Y == other.Y;
    }

    public override int GetHashCode()
    {
        return HashCode.Combine(X, Y);
    }
}

Solution

  • Yes, this is a bug in the SortedSet.

    As far as I understand the erroneous code is the one determining the size of the set (which seems to be a red-black tree) to later be used as part of some bit magic:

    int originalLastIndex = Count;
    

    While the index of the current element is determined with the following code:

    internal virtual int InternalIndexOf(T item)
    {
        Node? current = root;
        int count = 0;
        while (current != null)
        {
            int order = comparer.Compare(item, current.Item);
            if (order == 0)
            {
                return count;
            }
    
            current = order < 0 ? current.Left : current.Right;
            count = order < 0 ? (2 * count + 1) : (2 * count + 2);
        }
    
        return -1;
    }
    

    Which later is used for marking bits representing the encountered index:

    int intArrayLength = BitHelper.ToIntArrayLength(originalLastIndex);
    
    Span<int> span = stackalloc int[StackAllocThreshold];
    BitHelper bitHelper = intArrayLength <= StackAllocThreshold ?
        new BitHelper(span.Slice(0, intArrayLength), clear: true) :
        new BitHelper(new int[intArrayLength], clear: false);
    
    // ...
    
    foreach (T item in other)
    {
        int index = InternalIndexOf(item);
        if (index >= 0)
        {
            if (!bitHelper.IsMarked(index))
            {
                // item hasn't been seen yet
                bitHelper.MarkBit(index);
            }
        }
        // ...
    }
    

    And BitHelper implementation.

    The problem is that for unbalanced case (or maybe the index math is not "suitable" for such usage) the index calculated by InternalIndexOf "overflows" the capacity of BitHelper resulting in misses for unbalanced duplicate items.

    The minimal repro for me is the following:

    SortedSet<int> ss = new();
    HashSet<int> h = new();
    
    var num = 18;
    for (int i = 0; i < num; i++)
    {
        h.Add(i);
        ss.Add(i);
    }
    
    var data = Enumerable.Range(0, num).Append(17);
    var hashSetEquals = h.SetEquals(data); // True
    var sortedSetEquals = ss.SetEquals(data); // False 
    

    Note that if you use the constructor for SortedSet which results in balanced (based on the implementation) tree the code will work:

    HashSet<int> h = new();
    var num = 18;
    for (int i = 0; i < num; i++)
    {
        h.Add(i);
    }
    SortedSet<int> ss = new(h); // will balance the tree
    var data = Enumerable.Range(0, num).Append(17);
    var hashSetEquals = h.SetEquals(data);
    var sortedSetEquals = ss.SetEquals(data);
    

    Demo @sharplab.io

    The same problem arises for "balanced" when number of elements is divisible by 32 (bit size of int used in calculations). Demo @ sharplab. And for some collection sizes 64+ I've checked.

    P.S.

    Somebody has reported the issue @github