Search code examples
javajava-8java-streamguava

Guava's Streams::findLast implementation


I am looking into the implementation of Streams::findLast from Guava and while trying to understand it, there were a couple of things that simply I could not grasp. Here is it's implementation:

public static <T> java.util.Optional<T> findLast(Stream<T> stream) {
    class OptionalState {

        boolean set = false;
        T value = null;

        void set(@Nullable T value) {
            set = true;
            this.value = value;
        }

        T get() {
            checkState(set);
            return value;
        }
    }
    OptionalState state = new OptionalState();

    Deque<Spliterator<T>> splits = new ArrayDeque<>();
    splits.addLast(stream.spliterator());

    while (!splits.isEmpty()) {
        Spliterator<T> spliterator = splits.removeLast();

        if (spliterator.getExactSizeIfKnown() == 0) {
            continue; // drop this split
        }

        // Many spliterators will have trySplits that are SUBSIZED even if they are not themselves
        // SUBSIZED.
        if (spliterator.hasCharacteristics(Spliterator.SUBSIZED)) {
            // we can drill down to exactly the smallest nonempty spliterator
            while (true) {
                Spliterator<T> prefix = spliterator.trySplit();
                if (prefix == null || prefix.getExactSizeIfKnown() == 0) {
                    break;
                } else if (spliterator.getExactSizeIfKnown() == 0) {
                    spliterator = prefix;
                    break;
                }
            }

            // spliterator is known to be nonempty now
            spliterator.forEachRemaining(state::set);
            return java.util.Optional.of(state.get());
        }

        Spliterator<T> prefix = spliterator.trySplit();
        if (prefix == null || prefix.getExactSizeIfKnown() == 0) {
            // we can't split this any further
            spliterator.forEachRemaining(state::set);
            if (state.set) {
                return java.util.Optional.of(state.get());
            }
            // fall back to the last split
            continue;
        }
        splits.addLast(prefix);
        splits.addLast(spliterator);
    }
    return java.util.Optional.empty();
}

In essence the implementation is not that complicated to be honest, but here are the things that I find a bit weird (and I'll take the blame here if this question gets closed as "opinion-based", I understand it might happen).


First of all is the creation of OptionalState class, this could have been replaced with an array of a single element:

T[] state = (T[]) new Object[1];

and used as simple as:

spliterator.forEachRemaining(x -> state[0] = x);

Then the entire method could be split into 3 pieces:

  1. when a certain Spliterator is known to be empty:

    if (spliterator.getExactSizeIfKnown() == 0)

In this case it's easy - just drop it.

  1. then if the Spliterator is known to be SUBSIZED. This is the "happy-path" scenario; as in this case we can split this until we get to the last element. Basically the implementation says: split until the prefix is either null or it's empty (in which case consume the "right" spliterator) or if after a split the "right" spliterator is known to be empty, consume the prefix one. This is done via:

    // spliterator is known to be nonempty now spliterator.forEachRemaining(state::set); return java.util.Optional.of(state.get());

Second question I have is actually about this comment:

// Many spliterators will have trySplits that are SUBSIZED 
// even if they are not themselves SUBSIZED.

This is very interesting, but I could not find such an example, would appreciate if someone would introduce me to one. As a matter of fact, because this comment exists, the code in the next (3-rd part of the method can not be done with a while(true) like the second), because it assumes that after a trySplit we could obtain a Spliterator that is SUBSIZED, even if our initial one was not, so it has to go to the very beginning of findLast.

  1. this part of the method is when a Spliterator is known not to be SUBSIZED and in this case it does not have a known size; thus it relies on how the Spliterator from the source is implemented and in this case actually a findLast makes little sense... for example a Spliterator from a HashSet will return whatever the last entry is in the last bucket...

Solution

    1. When you iterate a Spliterator of an unknown size, you have to track whether an element has been encountered. This can be done by calling tryAdvance and using the return value or by using forEachRemaining with a Consumer which records whether an element has been encountered. When you go the latter route, a dedicated class is simpler than an array. And once you have a dedicated class, why not use it for the SIZED spliterator as well.

      What’s strange to me, is that this local class, which only exists to be used as a Consumer, doesn’t implement Consumer but requires the binding via state::set.

    2. Consider

      Stream.concat(
          Stream.of("foo").filter(s -> !s.isEmpty()),
          Stream.of("bar", "baz"))
      

      The Spliterator representing the entire stream can’t have the SIZED characteristic. But when splitting off the first substream with the unknown size, the remaining stream has a known size.

      Test code:

      Spliterator<String> sp = Stream.concat(
          Stream.of("foo").filter(s -> !s.isEmpty()),
          Stream.of("bar", "baz"))
          .spliterator();
      do {
          System.out.println(
                "SIZED: "+sp.hasCharacteristics(Spliterator.SIZED)
              + ", SUBSIZED: "+sp.hasCharacteristics(Spliterator.SUBSIZED)
              + ", exact size if known: "+sp.getExactSizeIfKnown());
      } while(sp.trySplit() != null);
      

      Result:

      SIZED: false, SUBSIZED: false, exact size if known: -1
      SIZED: true, SUBSIZED: true, exact size if known: 2
      SIZED: true, SUBSIZED: true, exact size if known: 1
      

      But to me, it looks weird when someone tells in a comment to know that splitting can change the characteristics and then doing a pre-test with SUBSIZED, instead of just doing the split and check whether the result has a known size. After all, the code is doing the split anyway, in the alternative branch, when the characteristic is not present. In my old answer, I did the pretest to avoid allocating data structures, but here, the ArrayDeque is always created and used. But I think, even my old answer could be simplified.

    3. I’m not sure what you are aiming at. When a Spliterator has the ORDERED characteristic, the order of traversal and splitting is well-defined. Since HashSet is not ordered, the term “last” is meaningless. If you are radical, you could optimize the operation to just return the first element for unordered streams; that’s valid and much faster.

      What is strange, is this condition:

      if (prefix == null || prefix.getExactSizeIfKnown() == 0) {
          // we can't split this any further
      

      (and a similar loop termination in the SUBSIZED path)

      Just because one prefix happened to have a known zero size, it does not imply that the suffix can’t split further. Nothing in the specification says that.

      As a consequence of this condition, Stream.concat(Stream.of("foo"), Stream.of("bar","baz")) can be handled optimally, whereas for Stream.concat(Stream.of(), Stream.of("bar", "baz")), it will fall back to a traversal, because the first prefix has a known size of zero.