Search code examples
jsonapache-sparkapache-spark-sqlparquet

How to convert a JSON file to parquet using Apache Spark?


I am new to Apache Spark 1.3.1. How can I convert a JSON file to Parquet?


Solution

  • Spark 1.4 and later

    You can use sparkSQL to read first the JSON file into an DataFrame, then writing the DataFrame as parquet file.

    val df = sqlContext.read.json("path/to/json/file")
    df.write.parquet("path/to/parquet/file")
    

    or

    df.save("path/to/parquet/file", "parquet")
    

    Check here and here for examples and more details.

    Spark 1.3.1

    val df = sqlContext.jsonFile("path/to/json/file")
    df.saveAsParquetFile("path/to/parquet/file")
    

    Issue related to Windows and Spark 1.3.1

    Saving a DataFrame as a parquet file on Windows will throw a java.lang.NullPointerException, as described here.

    In that case, please consider to upgrade to a more recent Spark version.