Search code examples
javasortingjava-stream

Does sorting a Java stream before calling allMatch with a predicate that has different costs gain any benefits?


I have implemented a Java stream pipeline where some elements are sorted and checked if all of them fulfill a predicate. The pipeline looks like this:

items.stream()
     .sorted(this::sortByType)
     .allMatch(this::isCompliant);

The validation by the predicate differentiates in costs. Some of the elements just need to check an attribute, for some of them I have to fetch potentially many elements from a database and evaluate them, too.

As I need to know that all of them are true, knowing that at least one is false, is equivalent. If I correctly understand the JavaDoc of the allMatch() operation, it should short curcuit as soon as one element is false.

So, in order to potentially improve the execution speed, I order the elements so that all of the very easy validations are executed first and only if they all succeed, I execute the more costly validations.

That is at least what I think happens. However, my IDE tells me, that sorting before calling allMatch() is redundant, because it does not depend on the sort order.

I wrote some tests which seem to indicate that the sort order is in fact regarded when traversing the pipeline, but maybe my test case is just to small or it just happened to be called correctly.

So my question is: Does allMatch() consider the sort order imposed beforehand? Or does it ignore it?


Solution

  • From your description

    The validation by the predicate differentiates in costs. Some of the elements just need to check an attribute, for some of them I have to fetch potentially many elements from a database and evaluate them, too.

    I created a simple test case. The class Waiter has a single int property that defines the sort order and also how long it takes to evaluate the predicate:

    public class Waiter implements Comparable<Waiter> {
        private final int time;
    
        public Waiter(int time) {
            this.time = time;
        }
    
        public boolean isValid() {
            try {
                Thread.sleep(time);
            } catch (InterruptedException e) {
                throw new RuntimeException(e);
            }
            return time > 10;
        }
    
        @Override
        public int compareTo(Waiter o) {
            return Integer.compare(time, o.time);
        }
    }
    

    The class Sorter creates a few Waiters and outputs the time for allMatch() on either the unsorted or the sorted list of waiters:

    import java.util.List;
    
    public class Sorter {
        public static void main(String[] args) {
            var data = List.of(
                    new Waiter(500),
                    new Waiter(500),
                    new Waiter(500),
                    new Waiter(500),
                    new Waiter(1)
            );
            System.out.printf("Duration sorted: %d ms%n", time(() -> data.stream().sorted().allMatch(Waiter::isValid)));
            System.out.printf("Duration unsorted: %d ms%n", time(() -> data.stream().allMatch(Waiter::isValid)));
            System.out.printf("Duration sorted: %d ms%n", time(() -> data.stream().sorted().allMatch(Waiter::isValid)));
            System.out.printf("Duration unsorted: %d ms%n", time(() -> data.stream().allMatch(Waiter::isValid)));
        }
    
        private static long time(Runnable r) {
            long start = System.currentTimeMillis();
            r.run();
            return System.currentTimeMillis() - start;
        }
    }
    

    And the result is as expected: if the Waiter that whos predicate evaluates to false is the first (i.e. the data is sorted) the allMatch() is evaluated faster:

    Duration sorted: 6 ms
    Duration unsorted: 2021 ms
    Duration sorted: 3 ms
    Duration unsorted: 2030 ms
    

    Note that if I change the last Waiter to also wait 500ms the unsorted variant is always faster (because sorting takes time and doesn't change the output for the case where all predicates match)


    Please remember that this is a very simplistic test and may be skewed by various factors:

    • this is not a proper microbenchmark. There is no warmup and only a two runs of both tests. The numbers might look different after a proper warmup.

    • sorting itself takes time. In this simple case the time to sort the data is small, but your data may differ

      • sorting pays off only if allMatch() can short-circuit (i.e. at least one predicate fails)

      • on the other hand, for the case where all predicates return true you have a performance penalty due to the sorting


    Conclusion:

    • sorting the stream offers benefits if some predicates fail and sorting has the effects that the predicates that fail are moved to the beginning of the sorted stream.

    • for the case where all predicates match there is a penalty for the sorting

    You need to find out which of those two cases better describes your problem.