Search code examples
javajava-stream

Advanced writing for grouping two collections with java, such as using stream API or other advanced type?


I have two list, one is contain "category", the other is contain some more detail, such as:

List<String> categoryList = new ArrayList<>(3);
categoryList.add("cat");
categoryList.add("dog");
categoryList.add("bull");
categoryList.add("other");

List<String> detailList = new ArrayList<>();
detailList.add("cat a");
detailList.add("cat b");
detailList.add("dog a");
detailList.add("dog b");
detailList.add("dog c");
detailList.add("bull a");
detailList.add("bird a");
detailList.add("bird b");

Map<String, List<String>> map = new HashMap<>();
for (String category : categoryList) {
    map.put(category,new ArrayList<>());
}

boolean isFind = false;
for (String detail : detailList) {
    isFind = false;
    for (String category : categoryList) {
        if (StrUtil.containsIgnoreCase(detail, category)) {
            map.get(category).add(detail);
            isFind = true;
            break;
        }
    }
    if (!isFind) {
        map.get("other").add(detail);
    }
}
    System.out.println(map);

The output is : {other=[bird a, bird b], cat=[cat a, cat b], dog=[dog a, dog b, dog c], bull=[bull a]}

I use the loop ,but i wonder if there are some advanced way to do it? thanks.


Solution

  • There are multiple ways to achieve this

    Collectors#groupingBy and List#contains

    Assuming your details and categories always have the same structure

    Map<String, List<String>> result = detailList.stream()
            .collect(Collectors.groupingBy(detail -> {
                String category = detail.substring(0, detail.indexOf(' '));
                return categoryList.contains(category) ? category : "other";
            }));
    

    However, List#contains performs poorely if the amount of data is high because it has a time complexity of O(n). So I would advise to go for the next one

    Collectors#groupingBy and Set#contains

    HashSet<String> categories = new HashSet<>(categoryList);
    
    Map<String, List<String>> result = detailList.stream()
            .collect(Collectors.groupingBy(detail -> {
                String category = detail.substring(0, detail.indexOf(' '));
                return categories.contains(category) ? category : "other";
            }));
    

    If you prefer looping however, instead of using the Stream API, you would want to look for something a bit more performant than what you have there.

    Also, something a bit more robust.

    Here are two problems I notice with your current code

    • The time complexity is O(m x n) while it could be O(m + n) assuming both lists are sorted alphabetically
    • You're using String#contains to identify whether a detail should be in one or the other category, possibly ending up having catepillars inside of the cat category