I have scenario to capture some data (not all) from an existing RDD
and then pass it to other Scala class
for actual operations. Lets see with example data(empnum, empname, emplocation, empsal) in a text file.
11,John,Paris,1000
12,Daniel,UK,3000
first step, I create an RDD
with RDD[String]
by below code,
val empRDD = spark
.sparkContext
.textFile("empInfo.txt")
So, my requirement is to create another RDD
with empnum, empname, emplocation (again with RDD[String]
).
For that I have tried below code hence I am getting RDD[String, String, String]
.
val empReqRDD = empRDD
.map(a=> a.split(","))
.map(x=> (x(0), x(1), x(2)))
I have tried with Slice
also, it gives me RDD[Array(String)]
.
My required RDD should be of RDD[String]
to pass to required Scala class to do some operations.
The expected output should be,
11,John,Paris
12,Daniel,UK
Can anyone help me how to achieve?
I would try this
val empReqRDD = empRDD
.map(a=> a.split(","))
.map(x=> (x(0), x(1), x(2)))
val rddString = empReqRDD.map({case(id,name,city) => "%s,%s,%s".format(id,name,city)})