I Am trying to read data from zip file
can read whole text file as below
val f = sc.wholeTextFiles("hdfs://")
but don`t know, how to read text data inside zip file
Is there any possible way to do it, if yes please let me know.
You can create an RDD from the zipFile with the newAPIHadoopFile command.
import com.cotdp.hadoop.ZipFileInputFormat
import org.apache.hadoop.io.BytesWritable
import org.apache.hadoop.io.Text
import org.apache.hadoop.mapreduce.Job
val zipFileRDD = sc.newAPIHadoopFile(
"hdfs://tmp/sample_zip/LoanStats3a.csv.zip",
classOf[ZipFileInputFormat],
classOf[Text],
classOf[BytesWritable],
new Job().getConfiguration())
println("The file contents are: " + zipFileRDD.map(s => new String(s._2.getBytes())).first())