OptaPlanner Constraint Streams: Count Distinct Values in Planning Entity Set

I'm looking for some help with OptaPlanner's constraint streams. The problem is a variant on job-shop scheduling, and my planning entities (CandidateAssignment) are wrapping around two decision variables: choice of robot and assigned time grain. Each CandidateAssignment also has a field (a Set) denoting which physical containers in a warehouse will be filled by assigning that task.

The constraint I'm trying to enforce is to minimize the total number of containers used by all CandidateAssignments in a solution (the goal being to guide OptaPlanner towards grouping tasks by container... there are domain-specific benefits to this in the warehouse). If each CandidateAssignment could only service a single container, this would be easy:

  protected Constraint maximizeContainerCompleteness(ConstraintFactory constraintFactory) {
    return constraintFactory.forEach(CandidateAssignment.class)
        .filter(CandidateAssignment::isAssigned)
        .groupBy(CandidateAssignment::getContainerId, countDistinct())
        .penalizeConfigurable("Group by container");
  }

Moving from a single ID to a collection seems less straightforward to me (i.e., if CandidateAssignment:getContainerIds returns a set of integers). Any help would be much appreciated.

EDIT: Thanks Christopher and Lukáš for the responses. Christopher's constraint matches my use case (minimize the number of containers serviced by a solution). However, this ends up being a pretty poor way to guide OptaPlanner towards (more) optimal solutions since it's operating via iterated local search. Given a candidate solution, the majority of neighbors in that solution's neighborhood will have equal value for that constraint (# unique containers used), so it doesn't have much power of discernment.

The approach I've tested with reasonable results is as follows:

  protected Constraint maximizeContainerCompleteness(ConstraintFactory constraintFactory) {
    return constraintFactory.forEach(CandidateAssignment.class)
        .filter(CandidateAssignment::isAssigned)
        .join(Container.class, Joiners.filtering(
            (candidate, container) -> candidate.getContainerIds().contains(container.getContainerId())))
        .rewardConfigurable("Group by container", (candidate, container) -> container.getPercentFilledSquared());
  }

This is a modified version of Lukáš' answer. It works by prioritizing containers which are "mostly full." In the real-world use case (which I think I explained pretty poorly above), we'd like to minimize the number of containers used in a solution because it allows the warehouse to replace those containers with new ones which are "easier" to fulfill (the search space is less constrained). We're planning in a receding time horizon, and having many partially filled bins means that each planning horizon becomes increasingly more difficult to schedule. "Closing" containers by fulfilling all associated tasks means we can replace that container with a new one and start fresh.

Anyways, just a bit of context. This is a very particular use case, but if anyone else reads this and wants to know how to work with this type of constraint, hopefully that helps.

Solution

Interpreting your constraint as "Penalize by 1 for each container used", this should work:

Constraint maximizeContainerCompleteness(ConstraintFactory constraintFactory) {
    return constraintFactory.forEach(CandidateAssignment.class)
            .filter(CandidateAssignment::isAssigned)
            .flattenLast(CandidateAssignment::getContainerIds)
            .distinct()
            .penalizeConfigurable("Group by container");
}

What it does: for each assigned candidate assignment, flatten its set of container ids (resulting in a stream of non-distinct used container ids), take the distinct elements of that stream (resulting in a stream of distinct used container ids), and trigger a penalize call for each one.