Search code examples
javalistjava-8collectors

Splitting List into sublists along elements


I have this list (List<String>):

["a", "b", null, "c", null, "d", "e"]

And I'd like something like this:

[["a", "b"], ["c"], ["d", "e"]]

In other words I want to split my list in sublists using the null value as separator, in order to obtain a list of lists (List<List<String>>). I'm looking for a Java 8 solution. I've tried with Collectors.partitioningBy but I'm not sure it is what I'm looking for. Thanks!


Solution

  • The only solution I come up with for the moment is by implementing your own custom collector.

    Before reading the solution, I want to add a few notes about this. I took this question more as a programming exercise, I'm not sure if it can be done with a parallel stream.

    So you have to be aware that it'll silently break if the pipeline is run in parallel.

    This is not a desirable behavior and should be avoided. This is why I throw an exception in the combiner part (instead of (l1, l2) -> {l1.addAll(l2); return l1;}), as it's used in parallel when combining the two lists, so that you have an exception instead of a wrong result.

    Also this is not very efficient due to list copying (although it uses a native method to copy the underlying array).

    So here's the collector implementation:

    private static Collector<String, List<List<String>>, List<List<String>>> splitBySeparator(Predicate<String> sep) {
        final List<String> current = new ArrayList<>();
        return Collector.of(() -> new ArrayList<List<String>>(),
            (l, elem) -> {
                if (sep.test(elem)) {
                    l.add(new ArrayList<>(current));
                    current.clear();
                }
                else {
                    current.add(elem);
                }
            },
            (l1, l2) -> {
                throw new RuntimeException("Should not run this in parallel");
            },
            l -> {
                if (current.size() != 0) {
                    l.add(current);
                    return l;
                }
            );
    }
    

    and how to use it:

    List<List<String>> ll = list.stream().collect(splitBySeparator(Objects::isNull));
    

    Output:

    [[a, b], [c], [d, e]]
    


    As the answer of Joop Eggen is out, it appears that it can be done in parallel (give him credit for that!). With that it reduces the custom collector implementation to:

    private static Collector<String, List<List<String>>, List<List<String>>> splitBySeparator(Predicate<String> sep) {
        return Collector.of(() -> new ArrayList<List<String>>(Arrays.asList(new ArrayList<>())),
                            (l, elem) -> {if(sep.test(elem)){l.add(new ArrayList<>());} else l.get(l.size()-1).add(elem);},
                            (l1, l2) -> {l1.get(l1.size() - 1).addAll(l2.remove(0)); l1.addAll(l2); return l1;});
    }
    

    which let the paragraph about parallelism a bit obsolete, however I let it as it can be a good reminder.


    Note that the Stream API is not always a substitute. There are tasks that are easier and more suitable using the streams and there are tasks that are not. In your case, you could also create a utility method for that:

    private static <T> List<List<T>> splitBySeparator(List<T> list, Predicate<? super T> predicate) {
        final List<List<T>> finalList = new ArrayList<>();
        int fromIndex = 0;
        int toIndex = 0;
        for(T elem : list) {
            if(predicate.test(elem)) {
                finalList.add(list.subList(fromIndex, toIndex));
                fromIndex = toIndex + 1;
            }
            toIndex++;
        }
        if(fromIndex != toIndex) {
            finalList.add(list.subList(fromIndex, toIndex));
        }
        return finalList;
    }
    

    and call it like List<List<String>> list = splitBySeparator(originalList, Objects::isNull);.

    It can be improved for checking edge-cases.