Search code examples
scalaactorcontinuationsdirectory-walk

Recursively walk a LARGE directory using Scala 2.8 continuations


Is it possible to recursively walk a directory using Scala continuations (introduced in 2.8)?

My directory contains millions of files, so I cannot use a Stream because I will get an out-of-memory. I am trying to write an Actor dispatch to have worker actors process the files in parallel.

Does anyone have an example?


Solution

  • If you want to stick with Java 1.6 (as opposed to FileVistor in 1.7), and you have subdirectories instead of all your millions of files in just one directory, you can

    class DirectoryIterator(f: File) extends Iterator[File] {
      private[this] val fs = Option(f.listFiles).getOrElse(Array[File]())
      private[this] var i = -1
      private[this] var recurse: DirectoryIterator = null
      def hasNext = {
        if (recurse != null && recurse.hasNext) true
        else (i+1 < fs.length)
      }
      def next = {
        if (recurse != null && recurse.hasNext) recurse.next
        else if (i+1 >= fs.length) {
          throw new java.util.NoSuchElementException("next on empty file iterator")
        }
        else {
          i += 1;
          if (fs(i).isDirectory) recurse = new DirectoryIterator(fs(i))
          fs(i)
        }
      }
    }
    

    This requires that your filesystem has no loops. If it does have loops, you need to keep track of the directories you hit in a set and avoid recursing them again. (If you don't even want to hit the files twice if they're linked from two different places, you then have to put everything into a set, and there's not much point using an iterator instead of just reading all the file info into memory.)