lets assume following class...
class Foo {
private Bar1 bar1;
private Bar2 bar2;
// many other fields
@Override
public boolean equals(Object o) {
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
Foo foo = (Foo) o;
if (!bar1.equals(foo.getBar1()) return false;
if (!bar2.equals(foo.getBar2()) return false;
// etc...
}
@Override
public int hashCode() {
int result = bar1.hashCode();
result = 31 * result + bar2.hashCode();
// etc...
}
// setters & getters follow...
}
Thousands of Foo instances per minute are created, processed and consequently recycled in a pool. The workflow is following:
Set<Foo> foos = new THashSet<>();
while (there-is-data) {
String serializedDataFromApi = api.getData();
Set<Foo> buffer = pool.deserializeAndCreate(serializedDataFromApi);
foos.addAll(buffer);
}
processor.process(foos);
pool.recycle(foos);
Problem is that there can be duplicate foo objects (with same values) among different buffers. These are materialized as different instances of Foo, however they are considered equal at the moment of calling foos.addAll(buffer).
My questions are:
What happened with those "duplicate" instances? Are they "lost" and garbage collected?
Yes, these will be eligible for GC immediately after current iteration of while (there-is-data)
is finished
If I wanted to keep those instances available in pool, what would be the most effective way to test for duplicates before inserting using addAll and recycling instances?
Set.add
returns true
if element is inserted and false
if it is duplicate. So you can replace addAll
with
for (Foo f : buffer) {
if (!foos.add(f)) {
// handle duplicate
}
}
There will be no performance hit because addAll
does the same - iterates and adds one by one.