Search code examples
scalaapache-sparkhdfs

How to take count of no of files in hdfs directory


Hi I'm using below code to list out no files from hdfs directory . Now I have to count no of files I got from list using below code but getting error.

Error

val hdfspath = "/data/fnl-ecomm/release/*/*/*/schema/*.json" import org.apache.hadoop.fs. (FileSystem, Path)

val fs = org.apache.hadoop.fs.FileSystem.get(spark.sparkContext.hadoopConfiguration)

fs.globStatus(new Path(hdfspath)).filter(.isFile).map(_.getPath).foreach(println)

var count: Int =0
 
if(status!=null)

while(status.hasNext)

{

status.next
 count += 1

} print("Count is", count)

}

else

{

print("count is zero")
}

I have tried above code but getting error


Solution

  • import org.apache.hadoop.fs.FileSystem
    
    val fs = FileSystem.get(spark.sparkContext.hadoopConfiguration)
    val filesCount = fs.listStatus(new Path(hdfsPath)).count(_.isFile)