First of all I need some very efficient solution as I am comparing collections with >300k elements.
At the beginning we have two different classes
Class A {
String keyA;
String keyB;
String keyC;
}
Class B {
String keyA;
String keyB;
String keyC;
String name;
String code;
toA() {
return new A(keyA, keyB, keyC);
}
}
Both of them contains several fields which are composed key(in this example key of three columns = keyA keyB keyC)
This composed key makes calculation very long for primitive brute forces using nested loops. So I figured out that the most efficient way would be to transform second class to first one using method toA and then I can safely compare them using for example google's api using Sets efficiency
Set<A> collectionA = <300k of elements>
Set<B> collectionB = <300k of elements>
Set<A> collectionBConvertedToA = collectionB.stream().map(item -> item.toA()).collect(toSet())
Set<A> result = Sets.differences(collectionBConvertedToA, collectionA); // very fast first full scan comparison
Set<String> changedNames = result.stream()
.map(outer -> collectionB.stream()
// very slow second full scan comparison
.filter(inner -> inner.getKeyA().equals(outer.getKeyA())
&& inner.getKeyB().equals(outer.getKeyB())
&& inner.getKeyC().equals(outer.getKeyC()))
.findFirst()
.map(item -> item.getName()))
.collect(toSet());
log.info("changed names" + changedNames);
Guava Sets.differences can find differences on Sets >300k in less than 1/10 of second but later on I still have full scan anyway to collect names.
I am just guessing, but is there something like
Set<B> result = Sets.differences(setA, setB, a -> a.customHashCode(), b -> b.customHashCode(), (a, b) -> a.customEquals(b))
with custom hashCode and custom equals methods to keep Sets efficiency or there is some better pattern to make such comparison as I believe it seems like common problem ?
EDIT I just figured out that I can just flip conversion to extended class
toB() {
return new B(keyA, keyB, keyC, null, null);
}
but then I need override hashCode and equals to use only those 3 fields and I still believe there is more elegant way
This is O(n^2)
since you are streaming collectionB
for each element in result. The following should work pretty fast:
Set<String> changedNames = collectionB.stream()
.filter(b -> collectionA.contains(b.toA())
.map(item -> item.getName()).collect(toSet());