Search code examples
javagarbage-collectionduplicatessetpooling

Java: Object pooling and hash sets


lets assume following class...

class Foo {

  private Bar1 bar1;
  private Bar2 bar2;

  // many other fields

  @Override
  public boolean equals(Object o) {
    if (this == o) return true;
    if (o == null || getClass() != o.getClass()) return false;
    Foo foo = (Foo) o;
    if (!bar1.equals(foo.getBar1()) return false;
    if (!bar2.equals(foo.getBar2()) return false;
    // etc...
  }

  @Override
  public int hashCode() {
    int result = bar1.hashCode();
    result = 31 * result + bar2.hashCode();
    // etc...
  }

  // setters & getters follow...
}

Thousands of Foo instances per minute are created, processed and consequently recycled in a pool. The workflow is following:

Set<Foo> foos = new THashSet<>();
while (there-is-data) {

  String serializedDataFromApi = api.getData();
  Set<Foo> buffer = pool.deserializeAndCreate(serializedDataFromApi);
  foos.addAll(buffer);
}

processor.process(foos);
pool.recycle(foos);

Problem is that there can be duplicate foo objects (with same values) among different buffers. These are materialized as different instances of Foo, however they are considered equal at the moment of calling foos.addAll(buffer).

My questions are:

  • What happened with those "duplicate" instances?
  • Are they "lost" and garbage collected?
  • If I wanted to keep those instances available in pool, what would be the most effective way to test for duplicates before inserting using addAll and recycling instances?

Solution

  • What happened with those "duplicate" instances? Are they "lost" and garbage collected?

    Yes, these will be eligible for GC immediately after current iteration of while (there-is-data) is finished

    If I wanted to keep those instances available in pool, what would be the most effective way to test for duplicates before inserting using addAll and recycling instances?

    Set.add returns true if element is inserted and false if it is duplicate. So you can replace addAll with

    for (Foo f : buffer) {
      if (!foos.add(f)) {
        // handle duplicate
      }
    }
    

    There will be no performance hit because addAll does the same - iterates and adds one by one.