Search code examples
scalafileioiterator

How to generalise implementations of 'Seq[String] => Seq[Int]' and 'Iterator[String] => Iterator[Int]' for file processing?


Suppose I've got a function Seq[String] => Seq[Int], e.g. def len(as: Seq[String]): Int = as.map(_.length). Now I would like to apply this function to a text file, e.g. transform all the file lines to numbers.

I read a text file as scala.io.Source.fromFile("/tmp/xxx.txt").getLines that returns an iterator.
I can use toList or to(LazyList) to "convert" the iterator to Seq but then I read the whole file into the memory.

So I need to write another function Iterator[String] => Iterator[Int], which is actually a copied version of Seq[String] => Seq[Int]. Is it correct ? What is the best way to avoid the duplicated code?


Solution

  • If you have an arbitrary function Seq[String] => Seq[Int], then

    I use toList or to(LazyList) to "convert" the iterator to Seq but in both cases I read the whole file in the memory.

    is the best you can do, because the function can start by looking at the end of the Seq[String], or its length, etc.

    And Scala doesn't let you look "inside" the function and figure out "it's map(something), I can just do the same map for iterators" (there are some caveats with macros, but not really useful here).

    So I need to write another function Iterator[String] => Iterator[Int], which is actually a copied version of Seq[String] => Seq[Int]. Is it correct ? What is the best way to avoid the duplicated code?

    If you control the definition of the function, you can use higher-kinded types to define a function which works for both cases. E.g. in Scala 2.13

    def len[C[A] <: IterableOnceOps[A, C, C[A]]](as: C[String]): C[Int] = as.map(_.length)
    
    val x: Seq[Int] = len(Seq("a", "b"))      
    val y: Iterator[Int] = len(Iterator("a", "b"))