Search code examples
jsonapache-sparkparquetspark-avro

How to save complex json or complex objects as Parquet in Spark?


I'm new to Spark and I'm trying to figure out if there is a way to save complex objects (nested) or complex jsons as Parquet in Spark. I'm aware of the Kite SDK, but I understand it uses Map/Reduce.

I looked around but I was unable to find a solution.

Thanks for your help.


Solution

  • case class Address(city:String, block:String);
    case class Person(name:String,age:String, address:Address);
    val people = sc.parallelize(List(Person("a", "b", Address("a", "b")), Person("c", "d", Address("c", "d"))));
    
    val df  = sqlContext.createDataFrame(people);
    df.write.mode("overwrite").parquet("/tmp/people.parquet")
    

    This answer on SO helped. Spark SQL: Nested classes to parquet error

    But it was hard to find, so I've answered my own question here. Hope this help someone else looking for an example.