Search code examples
scalacsvhadoopamazon-emr

List files scala emr hdfs (csv file missing)


I am trying to list all the files in a directory on emr hdfs via the following method:

val directory = new File(directoryPath)
val fileStatusListIterator: RemoteIterator[LocatedFileStatus] = FileUtils.fs.listFiles(new Path(directoryPath), true)
while (fileStatusListIterator.hasNext) {
  val fileStatus = fileStatusListIterator.next
  if (fileStatus.isFile) {
    log.info(s"Iterator File Path: ${fileStatus.getPath}")
  }
}

my problem: it is listing everything except csv files.


Solution

  • I found the reason. because I was downloading this csv file exactly before the listing. Which mean the file could not make it on time to be catches from the iterator. Therefore, I have to use:

    Future.Await(Downloading,Duration.Inf)
    

    So it will be forced to wait till the downloading is finished then it will continue.