Search code examples
javajava-streamcountingcollectorsgroupingby

How to preserve all Subgroups while applying nested groupingBy collector


I am trying to group a list of employees by the gender and department.

How do I ensure all departments are included in a sorted order for each gender, even when the relevant gender count is zero?

Currently, I have the following code and output

employeeRepository.findAll().stream()
            .collect(Collectors.groupingBy(Employee::getGender, 
                        Collectors.groupingBy(Employee::getDepartment, 
                                              Collectors.counting())));

//output
//{MALE={HR=1, IT=1}, FEMALE={MGMT=1}}

Preferred output is:

{MALE={HR=1, IT=1, MGMT=0}, FEMALE={HR=0, IT=0, MGMT=1}}

Solution

  • To achieve that, first you have to group by department, and only then by gender, not the opposite.

    The first collector groupingBy(Employee::getDepartment, _downstream_ ) will split the data set into groups based on department. As it downstream collector partitioningBy(employee -> employee.getGender() == Employee.Gender.MALE, _downstream_ ) will be applied, it'll divide the data mapped to each department into two parts based on the employee gender. And finally, Collectors.counting() applied as a downstream will provide the total number of employees of each gender for every department.

    So the intermediate map produced by the collect() operation will be of type Map<String, Map<Boolean, Long>> - employee count by gender (Boolean) for each department (for simplicity, department is a plain string).

    The next step in transform this map into Map<Employee.Gender, Map<String, Long>> - employee count by department for each gender.

    My approach is to create a stream over the entry set and replace each entry with a new one, which will hold a gender as its key and in order to preserve the information about a department its value in turn will be an entry with a department as a key and a with a count by department as its value.

    Then collect the stream of entries with groupingBy by the entry key. Apply mapping as a downstream collector to extract the nested entry. And then apply Collectors.toMap() to collect entries of type Map.Entry<String, Long> into map.

    all departments are included in a sorted order

    To insure the order in the nested map (department by count) a NavigableMap should be used.

    In order to do that, a flavor of toMap() that expects a mapFactory needs to be used (it also expects a mergeFunction which isn't really useful for this task since there will be no duplicates, but it has to be provided as well).

    public static void main(String[] args) {
        List<Employee> employeeRepository = 
                List.of(new Employee("IT", Employee.Gender.MALE),
                        new Employee("HR", Employee.Gender.MALE),
                        new Employee("MGMT", Employee.Gender.FEMALE));
    
        Map<Employee.Gender, NavigableMap<String, Long>> departmentCountByGender = employeeRepository
                .stream()
                .collect(Collectors.groupingBy(Employee::getDepartment, // Map<String, Map<Boolean, Long>> - department to *employee count* by gender
                            Collectors.partitioningBy(employee -> employee.getGender() == Employee.Gender.MALE,
                                                      Collectors.counting())))
                .entrySet().stream()
                .flatMap(entryDep -> entryDep.getValue().entrySet().stream()
                        .map(entryGen -> Map.entry(entryGen.getKey() ? Employee.Gender.MALE : Employee.Gender.FEMALE,
                                                   Map.entry(entryDep.getKey(), entryGen.getValue()))))
                .collect(Collectors.groupingBy(Map.Entry::getKey,
                            Collectors.mapping(Map.Entry::getValue,
                                    Collectors.toMap(Map.Entry::getKey,
                                                     Map.Entry::getValue,
                                                     (v1, v2) -> v1,
                                                     TreeMap::new))));
    
        System.out.println(departmentCountByGender);
    }
    

    Dummy Employee class used for demo-purposes:

    class Employee {
        enum Gender {FEMALE, MALE};
    
        private String department;
        private Gender gender;
        // etc.
        
        // constructor, getters
    }
    

    Output

    {FEMALE={HR=0, IT=0, MGMT=1}, MALE={HR=1, IT=1, MGMT=0}}