Search code examples
javaarraylistcollections

Collection API remove method implementation


I have had a look at the Java Collection API. The Collection interface defines the remove method as follows:

public interface Collection<E> {
    // other methods
    boolean add(E e);
    boolean remove(Object o);
}

You can see that the remove method's parameter is of type Object instead of the collection's element type E. I understand the logic behind this design decision: I prefer getting false rather than a ClassCastException. It also relieves client code from casting objects prior to calling this method. The weird thing, however, is in the implementation of the remove method by some collections. Take, for example, the ArrayList implementation:

public class ArrayList<E> extends AbstractList<E> {
    //Partial listing
    
    public boolean remove(Object o) {
        final Object[] es = elementData;
        final int size = this.size;
        int i = 0;
        found: {
            if (o == null) {
                for (; i < size; i++)
                    if (es[i] == null)
                        break found;
            } else {
                for (; i < size; i++)
                    if (o.equals(es[i])) //heeeeeeeeer!!!!!!!!!
                        break found;
            }
            return false;
        }
        fastRemove(es, i);
        return true;
    }
}

Here, the implementation calls the equals method on the foreign object o rather than on the "trusted" collection elements. Isn't it safer to call the equals method on the elements of the collection (es[i].equals(o)) instead? For instance, if o is some malicious object that does not adhere to the equals contract, the remove method may remove random elements from the collection:

class MalObject {
    @Override
    public boolean equals(Object o) {
      return true;
    }
}

With the current remove implementation, calling list.remove(malObject) would randomly remove elements from the list.

I can't see any obvious reason why not to call the eequals method on the collection elements instead. What is the reason behind this implementation, is there any benefit behind it?


Solution

  • Your question assumes that the elements inside the collection are 'trusted' and the element you are supplying to the remove method is 'malicious'.

    You're making stuff up. "Malicious" is meaningless here. If you actually mean: That object is created by an entity that intends to make your machine do acts against your wishes, then you have already lost, the JVM is not hardened at all; if an object 'made it into heap' that is crafted by a malicious entity, that's it. It's over. Trivially, for example, when using a HashSet or HashMap, these collections must invoke .hashCode() on the supplied object to function, and that can also format your disk drives just as well.

    There are thus two options:

    1. You meant 'malicious' as in 'untrusted'. In which case, this question is moot: You're already compromised, switching the equals invocation around is not going to make even one iota of difference; you're just as compromised.
    2. You meant it differently. In which case: What did you mean, and explain why o is 'more malicious' than the elements in the collection.

    Isn't it safer to call the equals method on the elements of the collection

    That would be a no, then, unless I've missed some key insight and you meant it differently. Neither is safer than the other; it is, for the purposes of security, utterly irrelevant which way around you invoke these methods.

    With the current remove implementation, calling list.remove(malObject) would randomly remove elements from the list.

    No, it would remove the first checked element from the list. For those collections without a defined order, that's still true, but which object is 'first checked' is not defined. For all others, the first object is deleted. This is identical to the reverse scenario: If the objects in the list have that silly definition of equals, then the exact same thing would happen. And the spec of Collection requires all this.

    What is the reason behind this implementation, is there any benefit behind it?

    Yes, there is. It saves a null check, for starters: Collections can, unless the specific implementation mentions otherwise, contain null. The definition of remove treats null as having meaning and semantic equality consistency (in contrast to SQL where NULL is always defined as representing 'unknown', and NULL = NULL is NULL (which is falsy), not TRUE) - Calling list.remove(null) on a list with ["Foo", null, "Bar", "Baz", null] results in the list contains ["Foo", "Bar", "Baz", null] - the spec requires this. Hence, that if check would then boil down to:

    if (es[i] != null && es[i].equals(o)) which is more work.

    Even without that null check, doing it the other way around is slower. Assuming that it takes a couple of iterations before the object is found (or, if the object isn't in the collection at all, an iteration through all elements of the collection), the same equals impl is invoked every time (namely, the one of o). This can be faster depending on a million details (performance analysis is incredibly difficult as it depends on a million factors that are hard to isolate). It probably won't make a big difference, but if it makes a difference, it's in favour of invoking the same equals implementation every time.