Search code examples
filescalaioscala-collectionsscala-2.10

Reusable streams from file


How do I create a reusable Stream from a file in Scala? I have a huge file, and I want to use its contents multiple times, however I may not need to read the whole file completely

I have tried something like this, without success,

  // file iterator
  val f =  Source.fromFile("numberSeq.txt").getLines

  // construct stream from file iterator
  def numSeq: Stream[BigInt] = Stream.cons(BigInt(f.next()),numSeq)

  //test
  numSeq take 5 foreach println
  numSeq take 5 foreach println //the stream continues to print next file lines instead of going back to the first line

Solution

  • The simplest way is to use toStream right on your iterator:

    scala> val f = List(1,2,3,4,5,6,7,8,9,10).toIterator.toStream
    f: scala.collection.immutable.Stream[Int] = Stream(1, ?)
    
    scala> f take 5 foreach println
    1
    2
    3
    4
    5
    
    scala> f take 5 foreach println
    1
    2
    3
    4
    5
    

    In your concrete case, the problem was that you had the whole new stream on every numSeq call because of def used instead of val. You still need def for recursive definition, but don't forget to save it to the val before use:

    scala> def numSeq: Stream[BigInt] = Stream.cons(BigInt(f.next()),numSeq)
    numSeq: Stream[BigInt]
    
    scala> val numSeq1 = numSeq
    numSeq1: Stream[BigInt] = Stream(1, ?)
    
    scala> numSeq1 take 5 foreach println
    1
    2
    3
    4
    5
    
    scala> numSeq1 take 5 foreach println
    1
    2
    3
    4
    5
    

    Example of wrong usage (notice numSeq instead of numSeq1):

    scala> numSeq take 5 foreach println
    6
    7
    8
    9
    10
    
    scala> numSeq take 5 foreach println
    java.util.NoSuchElementException: next on empty iterator
      at scala.collection.Iterator$$anon$2.next(Iterator.scala:39)
      at scala.collection.Iterator$$anon$2.next(Iterator.scala:37)
      at scala.collection.LinearSeqLike$$anon$1.next(LinearSeqLike.scala:59)
      at .numSeq(<console>:14)
      ... 33 elided
    

    Btw, there is more cute #:: syntax for cons:

    import Stream._
    def numSeq: Stream[BigInt] = BigInt(f.next()) #:: numSeq
    val numSeq1 = numSeq
    

    Finally, the version with better encapsulation:

    val numSeq = { 
      def numSeq: Stream[BigInt] = BigInt(f.next()) #:: numSeq 
      numSeq
    }