Search code examples
scalascala-streams

Scala Stream prepend returns List instead of Stream


I have a Seq, x, and a Stream, y, and I wish to prepend x to y to obtain a new Stream. However, the static type of y is causing the Stream to be evaluated immediately, and I am confused why this is the case. Here is an example:

val x: Seq[Int] = Seq(1, 2, 3)
val y: Seq[Int] = Stream(4, 5, 6)
val z = x ++: y // z has dynamic type List instead of Stream

Since the ++: method is called on a Stream instance, I expect to get a Stream as a result, but instead I am getting a List as a result. Can someone please explain why this is happening?


Solution

  • tl;dr

    it's because of compiler type inference, and when you are using ++: on two Seq it's just construct another Seq. ++: creates builder which return type param is Seq, but default Seq builder is mutable.ListBuffer and it's return type is List[A] which is also Seq. So, by default it brakes laziness inside builder and return value will be List[Int] but return type will be Seq[Int].

    Problem investigation

    Lets watch to the ++: signature (for example in scala 2.12.10):

    def ++:[B >: A, That](that: TraversableOnce[B])(implicit bf: CanBuildFrom[Repr, B, That]): That = {
        val b = bf(repr)
        if (that.isInstanceOf[IndexedSeqLike[_, _]]) b.sizeHint(this, that.size)
        b ++= that
        b ++= thisCollection
        b.result
      }
    

    here we see implicit argument: bf: CanBuildFrom[Repr, B, That]. In line:

    val b = bf(repr) // b is Builder[B, That]
    

    here CanBuildFrom.apply called, it returns Builder[B, That]:

    trait CanBuildFrom[-From, -Elem, +To] {
      def apply(from: From): Builder[Elem, To]
    }
    

    When we call ++: on two Seq[Int] we have default CanBuildFrom and newBuilder for sequences (from scala.collection.Seq):

    object Seq extends SeqFactory[Seq] {
      /** $genericCanBuildFromInfo */
      implicit def canBuildFrom[A]: CanBuildFrom[Coll, A, Seq[A]] = ReusableCBF.asInstanceOf[GenericCanBuildFrom[A]]
    
      def newBuilder[A]: Builder[A, Seq[A]] = immutable.Seq.newBuilder[A]
    }
    

    we see, that newBuilder calls immutable.Seq.newBuilder from scala.collection.immutable.Seq:

    object Seq extends SeqFactory[Seq] {
      /** genericCanBuildFromInfo */
      implicit def canBuildFrom[A]: CanBuildFrom[Coll, A, Seq[A]] = ReusableCBF.asInstanceOf[GenericCanBuildFrom[A]]
      def newBuilder[A]: Builder[A, Seq[A]] = new mutable.ListBuffer
    }
    

    We see mutable.ListBuffer which is not lazy.

    Decision

    So, to keep laziness while your concatenation you should pass your own CanBuildFrom for Stream[Int], something like that:

    import scala.collection.generic.CanBuildFrom
    import scala.collection.mutable
    import scala.collection.mutable.Builder
    
    val x: Seq[Int] = Seq(1, 2, 3)
    val y: Seq[Int] = Stream(4, 5, 6)
    implicit val cbf = new CanBuildFrom[Seq[Int], Int, Stream[Int]] {
      override def apply(from: Seq[Int]): Builder[Int, Stream[Int]] =
        new mutable.LazyBuilder[Int, Stream[Int]] {
          override def result() = from.toStream
        }
    
      override def apply(): mutable.Builder[Int, Stream[Int]] = Stream.newBuilder[Int]
    }
    val z = x ++:(y) // not it will be Stream(1, ?)
    

    or you can just make streams from both sequences:

    val x: Seq[Int] = Seq(1, 2, 3)
    val y: Seq[Int] = Stream(4, 5, 6)
    val z = x.toStream ++: y.toStream
    

    and compiler will find implicit CanBuildFrom from Stream object, which is lazy:

    implicit def canBuildFrom[A]: CanBuildFrom[Coll, A, Stream[A]] = new StreamCanBuildFrom[A]