Search code examples
javajava-streamcollectors

Java reason behind IDENTITY_FINISH characteristic of java.util.stream.Collector


While implementing the java.util.stream.Collector I came across the IDENTITY_FINISH characteristic.

I did understand it, but I did not understand the reasoning behind it. Why not simply use the identity function for finisher function when needed? What's the use case(requirement) that led to the architectural decision to have it?

It doesn't seem to have any performance significance and if so, maybe identity functions in general should be considered to be optimized in a discarding manner. I see it rather as non needed additional convention to carry in the abstraction implementation.


Solution

  • Applying the identity function once is negligible, but consider a collector like groupingBy(function), which is a short-hand for groupingBy(function, Collectors.toList()).

    Now, if the downstream collector doesn’t have an identity finisher, the groupingBy collector’s finisher must run over the resulting map and apply the finisher to every group. So finisher has become a function that scales with the resulting map’s size, even when the actual function is a no-op. But even the default downstream collector toList() has an identity finisher which makes it desirable to skip this overhead when possible.

    It’s worth keeping in mind that the groupingBy collector itself doesn’t need a finishing operation either, so if the downstream collector has an identity finish, groupingBy can report the IDENTITY_FINISH characteristic too, which will be important if it is also used in a composition.