Search code examples
scalaloopsrecursionfunctional-programmingnested-loops

Scala: Most efficient way to process files in folder based on a file list


I am trying to find the most efficient way to process files in multiple folders based on a list of allowed files.

I have a list of allowed files that I should process.

The proces is as follows

  1. val allowedFiles = List("File1.json","File2.json","File3.json")
  2. Get list of folders in directory. For this I could use:
      def getListOfSubDirectories(dir: File): List[String] =
            dir.listFiles
               .filter(_.isDirectory)
               .map(_.getName)
               .toList
  1. Loop through each folder from step 2. and get all files. For this I would use :
    def getListOfFiles(dir: String):List[File] = {
        val d = new File(dir)
        if (d.exists && d.isDirectory) {
            d.listFiles.filter(_.isFile).toList
        } else {
            List[File]()
        }
    }
  1. If file from step 3. are in list of allowed files call another method that process the file

So I need to loop through a first directory, get files, check if file need to be procssed and then call another functionn. I was thinking about double loop which would work but is the most efficient way. I know in scala I should be using resursive funstions but failed with this double recursive function with call to extra method.

Any ideas welcome.


Solution

  • Files.find() will do both the depth search and filter.

    import java.nio.file.{Files,Paths,Path}
    import scala.jdk.StreamConverters._
    
    def getListOfFiles(dir: String, targets:Set[String]): List[Path] =
      Files.find( Paths.get(dir)
                , 999
                , (p, _) => targets(p.getFileName.toString)
                ).toScala(List)
    

    usage:

    val lof = getListOfFiles("/DataDir",  allowedFiles.toSet)
    

    But, depending on what kind of processing is required, instead of returning a List you might just process each file as it is encountered.

    import java.nio.file.{Files,Paths,Path}
    
    def processFile(path: Path): Unit = ???
      
    def processSelected(dir: String, targets:Set[String]): Unit =
      Files.find( Paths.get(dir)
                , 999
                , (p, _) => targets(p.getFileName.toString)
                ).forEach(processFile)