I am new to Apache Spark 1.3.1. How can I convert a JSON file to Parquet?
Spark 1.4 and later
You can use sparkSQL to read first the JSON file into an DataFrame, then writing the DataFrame as parquet file.
val df = sqlContext.read.json("path/to/json/file")
df.write.parquet("path/to/parquet/file")
or
df.save("path/to/parquet/file", "parquet")
Check here and here for examples and more details.
Spark 1.3.1
val df = sqlContext.jsonFile("path/to/json/file")
df.saveAsParquetFile("path/to/parquet/file")
Issue related to Windows and Spark 1.3.1
Saving a DataFrame as a parquet file on Windows will throw a java.lang.NullPointerException
, as described here.
In that case, please consider to upgrade to a more recent Spark version.