Search code examples
apache-sparkapache-spark-sqlto-json

How to add comma between JSON elements using Spark Scala


I'm loading a table data into a dataframe and creating multiple JSON part files. The structure of the data is good, but the elements in JSON are not separated by commas.

This is the output:

{"time_stamp":"2016-12-08 01:45:00","Temperature":0.8,"Energy":111111.5,"Net_Energy":1111.3}
{"time_stamp":"2016-12-08 02:00:00","Temperature":21.9,"Energy":222222.5,"Net_Energy":222.0}

I'm supposed to get something like this:

{"time_stamp":"2016-12-08 01:45:00","Temperature":0.8,"Energy":111111.5,"Net_Energy":1111.3},
{"time_stamp":"2016-12-08 02:00:00","Temperature":21.9,"Energy":222222.5,"Net_Energy":222.0}

How do I do this?


Solution

  • Your output is correct JSONlines output: one JSON record per line, separated by newlines. You do not need commas between the lines. In fact, that would be invalid JSON.

    If you absolutely need to turn the entire output of a Spark job into a single JSON array of objects, there are two ways to do this:

    1. For data that fits in driver RAM, df.as[String].collect.mkString("[", ",", "]").

    2. For data that does not fit in driver RAM... you really shouldn't do it... but if you absolutely have to, use shell operations to begin with [, add a comma to every line of output and end in ].