Search code examples
javajava-8java-streamcollectors

java.util.stream.Collectors: Why is the summingInt implemented with an array?


The standard Collector summingInt internally creates an array of length one:

public static <T> Collector<T, ?, Integer>
summingInt(ToIntFunction<? super T> mapper) {
    return new CollectorImpl<>(
            () -> new int[1],
            (a, t) -> { a[0] += mapper.applyAsInt(t); },
            (a, b) -> { a[0] += b[0]; return a; },
            a -> a[0], CH_NOID);
}

I was wondering if it isn't possible to just define:

private <T> Collector<T, Integer, Integer> summingInt(ToIntFunction<? super T> mapper) {
    return Collector.of(
            () -> 0,
            (a, t) -> a += mapper.applyAsInt(t),
            (a, b) -> a += b,
            a -> a
    );
}

This however doesn't work since the accumulator just seems to be ignored. Can anyone explain this behaviour?


Solution

  • An Integer is immutable, while an Integer[] array is mutable. An accumulator is supposed to be stateful.


    Imagine you've got 2 references to 2 Integer objects.

    Integer a = 1;
    Integer b = 2;
    

    By nature, the instances you are referring to are immutable: you can't modify them once they have been created.

    Integer a = 1;  // {Integer@479}
    Integer b = 2;  // {Integer@480}
    

    You've decided to use a as an accumulator.

    a += b; 
    

    The value a is currently holding satisfies you. It's 3. However, a no longer refers to that {Integer@479} you used to have at the beginning.

    I added debug statements to your Collector and make things clear.

    public static  <T> Collector<T, Integer, Integer> summingInt(ToIntFunction<? super T> mapper) {
      return Collector.of(
          () -> {
            Integer zero = 0;
            System.out.printf("init [%d (%d)]\n", zero, System.identityHashCode(zero));
            return zero;
          },
          (a, t) -> {
            System.out.printf("-> accumulate [%d (%d)]\n", a, System.identityHashCode(a));
            a += mapper.applyAsInt(t);
            System.out.printf("<- accumulate [%d (%d)]\n", a, System.identityHashCode(a));
          },
          (a, b) -> a += b,
          a -> a
      );
    }
    

    If you use it, you'll notice a pattern like

    init [0 (6566818)]
    -> accumulate [0 (6566818)]
    <- accumulate [1 (1029991479)]
    -> accumulate [0 (6566818)]
    <- accumulate [2 (1104106489)]
    -> accumulate [0 (6566818)]
    <- accumulate [3 (94438417)]
    

    where 0 (6566818) is not being changed despite all abortive attempts with +=.

    If you rewrote it to using an AtomicInteger

    public static  <T> Collector<T, AtomicInteger, AtomicInteger> summingInt(ToIntFunction<? super T> mapper) {
      return Collector.of(
          () -> {
            AtomicInteger zero = new AtomicInteger();
            System.out.printf("init [%d (%d)]\n", zero.get(), System.identityHashCode(zero));
            return zero;
          },
          (a, t) -> {
            System.out.printf("-> accumulate [%d (%d)]\n", a.get(), System.identityHashCode(a));
            a.addAndGet(mapper.applyAsInt(t));
            System.out.printf("<- accumulate [%d (%d)]\n", a.get(), System.identityHashCode(a));
          },
          (a, b) -> { a.addAndGet(b.get()); return a;}
      );
    }
    

    you would be seeing a true accumulator (as a part of mutable reduction) in action

    init [0 (1494279232)]
    -> accumulate [0 (1494279232)]
    <- accumulate [1 (1494279232)]
    -> accumulate [1 (1494279232)]
    <- accumulate [3 (1494279232)]
    -> accumulate [3 (1494279232)]
    <- accumulate [6 (1494279232)]