Search code examples
javacomparatortreeset

Java arbitrary comparator on Object


Let's say I am building a TreeSet of an object for which the ordering depends on only one value.

I cannot do

TreeSet<Foo> tree = new TreeSet<>(Comparator.comparingInt(Foo::getX));

because if I add two different Foo objects with the same x then one would replace the other (i.e. if I do tree.add(foo1) and tree.add(foo2), tree.size() will be 1 instead of 2).

I could compare every field of Foo, but I would like two instances of Foo to be considered distinct, even if every field is the same.

One solution that "almost works" would be

TreeSet<Foo> tree = new TreeSet<>(Comparator.comparingInt(Foo::getX).thenComparing(Foo::hashCode));

But this would fail when there is a hash collision.

In summary, I am looking to something similar to

TreeSet<Foo> tree = new TreeSet<>(Comparator.comparingInt(Foo::getX).thenComparing(Foo::getInternalAddress));

But of course we don't have access to such method.

Workarounds

I know there are workarounds:

  • if I don't care about the objects themselves but just how many are in the tree, I can use a multiset TreeMap<Foo, Integer> (and comparing all the fields) that give me how many Foos for a specific x
  • if I do care about the objects (I am making reference equality checks) I can use a different multiset TreeMap<Foo, List<Foo>> (or TreeMap<Integer, List<Foo>> with the key being x). But if there are very few "duplicated" foos, this is a waste of space taken by all the singleton lists.

So while I know there are workarounds with a TreeMap, I still wonder if there is a way to do that with just a TreeSet.


Solution

  • Guava provides Ordering.arbitrary() which you can use to construct your desired Comparator:

    Comparator.comparingInt(Foo::getX).thenComparing(Ordering.arbitrary())
    

    Ordering.arbitrary() internally uses System.identityHashCode which almost solves your "order arbitrary, but consider identity" problem, except there's no guarantee that identityHashCode will never collide (trivially, since it's a 32bit value and a Java project can easily create more than 232 objects, given enough memory).

    Ordering.arbitrary() uses some neat tricks to still sort objects with colliding identityHashCode values in a consistent (but arbitrary!) way, see its code.