Search code examples
c#listcollectionsextension-methodscorruption

How to implement a specialized overload of the List.RemoveAll method, with an index parameter in the predicate?


The List<T>.RemoveAll is a quite useful method, that allows to remove efficiently multiple items from a list. Unfortunately in some scenarios I needed some extra features that the method doesn't have, and some guarantees that the documentation doesn't provide. It also has a questionable behavior in case the match predicate fails, that causes me anxiety. So in this question I am asking for an implementation of the same method, in the form of an extension method, with these features and characteristics:

  1. Instead of a Predicate<T> it accepts a Func<T, int, bool> delegate, where the int is the zero-based index of the T item.
  2. It guarantees that the predicate will be invoked exactly once for each item, in a stricly ascending order.
  3. In case the predicate returns true for some items and then fails for another item, the items that have been elected for removal are removed from the list before the propagation of the exception.

Here is the signature of the extension method that I am trying to implement:

public static int RemoveAll<T>(this List<T> list, Func<T, int, bool> predicate);

It returns the number of elements that were removed.

I attempted to implement it using as starting point the existing implementation, but it has some performance optimizations that make it quite complex, and injecting the desirable "exceptional" behavior is not obvious. I am interested for an implementation that is simple and reasonably efficient. Using LINQ in the implementation is not desirable, because it implies memory allocations that I would like to avoid.


Context: I should demonstrate the behavior of the built-in List<T>.RemoveAll method, and explain why I don't like it. In case the match predicate fails for an item in the middle of the list, the items that have already been elected for removal are either not removed, or they are replaced with duplicates of other elements. In all cases the list retains its original size. Here is a minimal demo:

List<int> list = new(Enumerable.Range(1, 15));
Console.WriteLine($"Before RemoveAll: [{String.Join(", ", list)}]");
try
{
    list.RemoveAll(item =>
    {
        if (item == 10) throw new Exception();
        bool removeIt = item % 2 == 1;
        if (removeIt) Console.WriteLine($"Removing #{item}");
        return removeIt;
    });
}
catch (Exception ex) { Console.WriteLine(ex); }
finally
{
    Console.WriteLine($"After RemoveAll: [{String.Join(", ", list)}]");
}

The list has 15 numbers, and the intention is to remove the odd numbers from the list. The predicate fails for the 10th number.

Output:

Before RemoveAll: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
Removing #1
Removing #3
Removing #5
Removing #7
Removing #9
System.Exception: Exception of type 'System.Exception' was thrown.
   at Program.<>c.<Main>b__0_0(Int32 item)
   at System.Collections.Generic.List`1.RemoveAll(Predicate`1 match)
   at Program.Main()
After RemoveAll: [2, 4, 6, 8, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]

Online demo.

As you can see the numbers 1 and 3 have been removed, the 5, 7 and 9 are still there, and the numbers 6 and 8 have been duplicated (there are two occurrences of each). On the contrary the output that I expected to see is:

After RemoveAll: [2, 4, 6, 8, 10, 11, 12, 13, 14, 15]

This would be a reasonable and predictable behavior I could count on. It keeps the levels of danger in a manageable level. I am not risking, for example, duplicating items in a virtual shopping cart, or printing twice some PDF documents from a selection. The existing behavior stretches a bit too much my comfort levels.

I have reported this behavior to Microsoft, and the feedback that I've got is that in case of failure the outcome is undefined. From their point of view there is no difference between the two above outputs (the actual and the expected). Both are equally corrupted, because both represent a state that is neither the original nor the final/correct state after a successful execution. So they don't think that there is any bug that needs to be fixed, and they are not keen on doing changes that could affect negatively the performance of successful executions. They also believe that the existing behavior is not surprising or unexpected, so there is no reason to document it.


Solution

  • I think that I've managed to come up with an implementation that satisfies all three requirements:

    /// <summary>
    /// Removes all the elements that match the conditions defined by the specified
    /// predicate. In case the predicate fails for some element, the list is left
    /// in a state recognizable as the result of successful individual Remove calls.
    /// </summary>
    public static int RemoveAll<T>(this List<T> list, Func<T, int, bool> predicate)
    {
        ArgumentNullException.ThrowIfNull(list);
        ArgumentNullException.ThrowIfNull(predicate);
    
        Span<T> span = CollectionsMarshal.AsSpan(list);
        int i = 0, j = 0;
        try
        {
            for (; i < span.Length; i++)
            {
                if (predicate(span[i], i)) continue;
                if (j < i) span[j] = span[i];
                j++;
            }
        }
        finally
        {
            if (j < i)
            {
                for (; i < span.Length; i++, j++)
                    span[j] = span[i];
                list.RemoveRange(j, span.Length - j);
            }
        }
        return i - j;
    }
    

    For better performance it uses the CollectionsMarshal.AsSpan method (.NET 5) to get a Span<T> out of the list. The algorithm works just as well by using the indexer of the list instead of the span, and replacing the span.Length with list.Count.

    Online demo.

    I haven't benchmark this implementation, but I expect it to be only marginally slower than the native implementation.