Search code examples
jsonscalaapache-sparkfile-formatstring-interpolation

how to format scala's output from JSON to text file format


I am using Scala with Spark with below version.

Scala - 2.10.4 Spark - 1.2.0

I am mentioning below my situation.

I have a RDD(Say - JoinOp) with nested tuples(having case classes), for example -

(123,(null,employeeDetails(Smith,NY,DW))) 
(456,(null,employeeDetails(John,IN,CS)))

This RDD is being created from a Join with two files.

Now, my requirement is to convert this JSON format to text file format without any "Null" and any case class name(here 'employeeDetails').

My desired output is =

123,Smith,NY,DW
456,John,IN,CS

I have tried with String Interpolation for the same but with partial success.

val textOp = JoinOp.map{jm => s"${jm._1},${jm._2._2}"}

if I print textOp then it will give me below output.

123,employeeDetails(Smith,NY,DW)
456,employeeDetails(John,IN,CS)

Now if I try to access nested elements in "employeeDetails" case class with String interpolation, it will throwing error like below.

JoinOp.map{jm => s"${jm._1},${jm._2._2._1}"}.foreach(println)

<console> :23: Error : value _1 is not member of jm

Here I can understand that, with the above syntax, it's unable to access nested element for "employeeDetails" case class.

What might be the solution for this issue. Any help or point forward would be of much help.

Many Thanks, Pralay


Solution

  • Case classes have field names. So, instead of ._1 you need to use the field name for that position. Assuming the following definition:

    case class EmployeeDetails(name: String, state: String)
    

    you would access it

    JoinOp.map{jm => s"${jm._1},${jm._2._2.name}"}.foreach(println)