Search code examples
javaalgorithmperformanceguava

Efficient way to compare two sets of different type


First of all I need some very efficient solution as I am comparing collections with >300k elements.

At the beginning we have two different classes

Class A {
   String keyA;
   String keyB;
   String keyC;
}

Class B {
   String keyA;
   String keyB;
   String keyC;
   String name;
   String code;

   toA() {
     return new A(keyA, keyB, keyC);
   }
}

Both of them contains several fields which are composed key(in this example key of three columns = keyA keyB keyC)

This composed key makes calculation very long for primitive brute forces using nested loops. So I figured out that the most efficient way would be to transform second class to first one using method toA and then I can safely compare them using for example google's api using Sets efficiency

Set<A> collectionA = <300k of elements>
Set<B> collectionB = <300k of elements>
Set<A> collectionBConvertedToA = collectionB.stream().map(item -> item.toA()).collect(toSet())

Set<A> result = Sets.differences(collectionBConvertedToA, collectionA); // very fast first full scan comparison

Set<String> changedNames = result.stream()
     .map(outer -> collectionB.stream()
                               // very slow second full scan comparison
                              .filter(inner -> inner.getKeyA().equals(outer.getKeyA()) 
                                           && inner.getKeyB().equals(outer.getKeyB()) 
                                           && inner.getKeyC().equals(outer.getKeyC()))
                              .findFirst()
                              .map(item -> item.getName()))
     .collect(toSet());
log.info("changed names" + changedNames);

Guava Sets.differences can find differences on Sets >300k in less than 1/10 of second but later on I still have full scan anyway to collect names.

I am just guessing, but is there something like

Set<B> result = Sets.differences(setA, setB, a -> a.customHashCode(), b -> b.customHashCode(), (a, b) -> a.customEquals(b))

with custom hashCode and custom equals methods to keep Sets efficiency or there is some better pattern to make such comparison as I believe it seems like common problem ?

EDIT I just figured out that I can just flip conversion to extended class

toB() {
  return new B(keyA, keyB, keyC, null, null);
}

but then I need override hashCode and equals to use only those 3 fields and I still believe there is more elegant way


Solution

  • This is O(n^2) since you are streaming collectionB for each element in result. The following should work pretty fast:

    Set<String> changedNames = collectionB.stream()
                                  .filter(b -> collectionA.contains(b.toA())
                                  .map(item -> item.getName()).collect(toSet());