Search code examples
javacollectionssetremoveall

How to guarantee the behavior of Set.removeAll with different type of sets?


I have a problem with the HashSet and TreeSet manipulation. Here is a simple JUnit 4 test explaining my problem :

import java.util.HashSet;
import java.util.TreeSet;
import java.util.concurrent.atomic.AtomicReference;

import org.junit.Assert;
import org.junit.Test;

public class TreeSetTest<T> {

    @Test
    public void test() {
        final HashSet<Object> set1 = new HashSet<>();
        final TreeSet<Object> set2 = new TreeSet<>((a, b) -> a.toString().compareTo(b.toString()));
        Assert.assertEquals(0, set1.size()); // OK
        Assert.assertEquals(0, set2.size()); // OK
        set1.add(new AtomicReference<>("A"));
        set1.add(new AtomicReference<>("B"));
        set1.add(new AtomicReference<>("C"));
        Assert.assertEquals(3, set1.size()); // OK
        Assert.assertEquals(0, set2.size()); // OK
        set1.removeAll(set2);
        Assert.assertEquals(3, set1.size()); // OK
        Assert.assertEquals(0, set2.size()); // OK
        set2.add(new AtomicReference<>("A"));
        set1.removeAll(set2);
        Assert.assertEquals(3, set1.size()); // OK Nothing has been removed
        Assert.assertEquals(1, set2.size());
        set2.add(new AtomicReference<>("B"));
        set1.removeAll(set2);
        Assert.assertEquals(3, set1.size()); // OK Nothing has been removed
        Assert.assertEquals(2, set2.size());
        set2.add(new AtomicReference<>("C"));
        set1.removeAll(set2);
        Assert.assertEquals(3, set1.size()); // Error Everything has been removed and size is now 0
        Assert.assertEquals(3, set2.size());
    }

}

When removing all elements of set2 from set1, I'm expecting to use the equality comparator of set1 which is the case as long as set2 has a size less than the one of set1 but if the size of set2 is greater or equals to the size of set1, the comparison is made from the set2.

This is very bad for me because it makes the program unpredictable.

I think it can be considered as a bug in the java implementation but my concern is: How can I guarantee the expected behavior without rewritting eveything?

Edit 1 after @FedericoPeraltaSchaffner comment:

AtomicReference was just for providing a simple example. In fact, I am using a class which is final from a library so I cannot improve it easily. But even considering a valid class implementing correctly hashCode and equals, my problem remains. Consider this now :

package fr.ncenerar.so;

import java.util.HashSet;
import java.util.TreeSet;

import org.junit.Assert;
import org.junit.Test;

public class TreeSetTest<T> {

    public static class MyObj {
        private final int value1;
        private final int value2;

        public MyObj(final int v1, final int v2) {
            super();
            this.value1 = v1;
            this.value2 = v2;
        }

        public int getValue1() {
            return this.value1;
        }

        public int getValue2() {
            return this.value2;
        }

        @Override
        public int hashCode() {
            final int prime = 31;
            int result = 1;
            result = prime * result + this.value1;
            result = prime * result + this.value2;
            return result;
        }

        @Override
        public boolean equals(final Object obj) {
            if (this == obj) {
                return true;
            }
            if (obj == null) {
                return false;
            }
            if (this.getClass() != obj.getClass()) {
                return false;
            }
            final MyObj other = (MyObj) obj;
            if (this.value1 != other.value1) {
                return false;
            }
            if (this.value2 != other.value2) {
                return false;
            }
            return true;
        }
    }

    @Test
    public void test() {
        final HashSet<MyObj> set1 = new HashSet<>();
        final TreeSet<MyObj> set2 = new TreeSet<>((a, b) -> a.getValue1() - b.getValue1());
        Assert.assertEquals(0, set1.size()); // OK
        Assert.assertEquals(0, set2.size()); // OK
        set1.add(new MyObj(0, 0));
        set1.add(new MyObj(1, 1));
        set1.add(new MyObj(2, 2));
        Assert.assertEquals(3, set1.size()); // OK
        Assert.assertEquals(0, set2.size()); // OK
        set1.removeAll(set2);
        Assert.assertEquals(3, set1.size()); // OK
        Assert.assertEquals(0, set2.size()); // OK
        set2.add(new MyObj(0, 1));
        set1.removeAll(set2);
        Assert.assertEquals(3, set1.size()); // OK Nothing has been removed
        Assert.assertEquals(1, set2.size());
        set2.add(new MyObj(1, 2));
        set1.removeAll(set2);
        Assert.assertEquals(3, set1.size()); // OK Nothing has been removed
        Assert.assertEquals(2, set2.size());
        set2.add(new MyObj(2, 3));
        set1.removeAll(set2);
        Assert.assertEquals(3, set1.size()); // Error Everything has been removed
        Assert.assertEquals(3, set2.size());
    }

}

The problem is still there and MyObj implementation is correct. The problem comes from the fact that I am using the objects from two differents aspects. In one set, I want to keep one instance of each objects based on their equality (as in equals method of the object) and in another set, I want a subset of the first set in which for each value1, I want to keep only the firstly inserted element.

Using a TreeSet seemed valid.

Edit 2:

My bad, I missed that part of the TreeSet documentation:

Note that the ordering maintained by a set (whether or not an explicit comparator is provided) must be consistent with equals if it is to correctly implement the Set interface. (See Comparable or Comparator for a precise definition of consistent withequals.) This is so because the Set interface is defined interms of the equals operation, but a TreeSet instance performs all element comparisons using its compareTo (or compare) method, so two elements that are deemed equal by this method are, from the standpoint of the set, equal. The behavior of a set is well-defined even if its ordering is inconsistent with equals; it just fails to obey the general contract of the Set interface.

If I understand correctly, I can use a TreeSet for my purpose but can't expect it to behave like I want it to.

Thank you all for your help.


Solution

  • After searching and reading correctly the documentation about TreeSet, it turns out that:

    Note that the ordering maintained by a set (whether or not an explicit comparator is provided) must be consistent with equals if it is to correctly implement the Set interface. (See Comparableor Comparator for a precise definition of consistent with equals.) This is so because the Set interface is defined in terms of the equals operation, but a TreeSet instance performs all element comparisons using its compareTo (or compare) method, so two elements that are deemed equal by this method are, from the standpoint of the set, equal. The behavior of a set is well-defined even if its ordering is inconsistent with equals; it just fails to obey the general contract of the Set interface.

    Which means that the TreeSet used in the example cannot be used as a Set. Therefore, the easiest solution is to create a HashSet for the removeAlloperations by replacing every:

    set1.removeAll(set2);
    

    by

    set1.removeAll(new HashSet<>(set2));
    

    Maybe not the best solution from a performance point of view but a working one.

    Thank you all!