Search code examples
performanceoptimizationaggregationcollectorstimefold

UniConstraintStream : Why is there a limit to the number of collectors that can be passed to groupBy


In the interface UniConstraintStream, is there a reason why the methods groupBy has a predefined number of collectors?

What would be the implication of a method like this :

<GroupKey_, ResultContainerB_, ResultB_>
            MultiConstraintStream<GroupKey_, ResultB_> groupBy(
                    Function<A, GroupKey_> groupKeyMapping, UniConstraintCollector<A, ResultContainerB_, ResultB_>[] collectors);

This is because in our use case, we would like to aggregate different things (4 aggregates in total) for a single GroupKey and filter by comparing them before penalizing (or not). The only workaround we have now is to pass a single collector that will return an array of objects, each array index has a different computed result and we pick those results to compare.

This looks like this :

factory.forEach(OurClass.class)
  .groupBy(OurClass::GroupingKey, toList())
  .map((gk, list) -> {
    //Some Code
    return { cnt1, sum2, computedVal3, sum4, max5 };  
  })
  // Some filter
  .penalize(HardMediumSoftScore.ONE_HARD,
    (dutyDateDateIndex, cptrs) -> {
      int cnt1 = cptrs[0];
      int sum2 = cptrs[1];
      // And so on
      })

It works, but I suspect this is not ideal in terms of performance. Plus this is more error-prone (IMO) and less maintainable (IMO). It adds a mapping instead of just using a certain number of collectors.


Solution

  • There is a very unfortunate reason - Java's type system. It doesn't allow us to write a single class that would be generic in N parameters - we can only have classes for 1, 2, 3, 4 etc. parameters.

    In order for anything to produce tuples with 5 elements, there'd have to be a whole penta stream API. And hexa stream API, and septa stream API... And the amount of code required to do that is truly immense. We simply decided to end at 4, and then provided functionality such as map() to allow you to condense 4 back to 1.

    This doesn't solve your groupBy problem - that is indeed limited to 4 arguments and if one of them is your group key, then only 3 are left for collectors. This problem can be largely (but not entirely) solved by using the compose() constraint collector.

    To show a usage example for the compose() constraint collector, see the implementation of the average() constraint collector using count() and sum():

    compose(count(), sum(Shift::getLength), (count, sum) -> {         
        if (count == 0) {
            return null;
        } else {
            return sum / (double) count;
        }
    });